Coronavirus is a family of viruses that are named after their spiky crown. The novel coronavirus, also known as SARS-CoV-2, is a contagious respiratory virus that first reported in Wuhan, China. On 2/11/2020, the World Health Organization designated the name COVID-19 for the disease caused by the novel coronavirus. This notebook aims at exploring COVID-19 through data analysis and projections. The world is going through a difficult time and fighting with a deadly virus called COVID-19. Coronavirus disease 2019 (COVID-19) is an infectious disease caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). It was first identified in December 2019 in Wuhan, China, and has resulted in an ongoing pandemic. The first case may be traced back to 17 November 2019.As of 8 June 2020, more than 7.06 million cases have been reported across 188 countries and territories, resulting in more than 403,000 deaths. More than 3.16 million people have recovered.
I chose the Covid 19 data set from the following site(https://ourworldindata.org/coronavirus), and I will analyze the data, clean and perform some interesting processes and conclusions. I will strengthen the analysis and cleaning of global data. The data was downloaded from https://covid.ourworldindata.org/data/owid-covid-data.csv.
Confirmed cases and deaths: Data comes from the European Centre for Disease Prevention and Control (ECDC) Testing for COVID-19: Data is collected by the Our World in Data team from official reports; you can find the source information for every country and further details in the post on COVID-19 testing. The testing dataset is updated around twice a week. Confirmed cases and deaths: Data is collected from a variety of sources (United Nations, World Bank, Global Burden of Disease, etc.)
The information on this page is summarized from OWID's COVID-19 github page. All of Our World in Data is completely open access and all work is licensed under the Creative Commons BY license. More information about the usage of content can be found OWID github page.https://github.com/owid/covid-19-data/tree/master/public/data
OWID's COVID19 github page the data has been collected, aggregated, and documented by Diana Beltekian, Daniel Gavrilov, Joe Hasell, Bobbie Macdonald, Edouard Mathieu, Esteban Ortiz-Ospina, Hannah Ritchie, Max Roser.
Created a Linear regression model and fit the model with owid COVID19 data, predicted the world death projection for the next 30 days. In this project I have used sklearn for creating Linear Regression model and created training split with 80 to 20%. The trained the model and predicted the death for next 30 days. Also created model using XGBoost for improving the linear regression model and fit the model with owid COVID19 data, predicted the world death projection for the next 30 days.
I will create a model that can predict the risk for the Case Mortality Ratio of a Country utilizing its Life Expectancy, Percentage of Population over 65, and Percentage of diabetes_prevalence and cardiovasc_death_rate ?
It decided on using Population Over Age 65 and Obesity because in the world, over 80% of the deaths were in the population 65 and over, and the CDC has stated that 94% of deaths had some underlying health condition. We also used Life Expectency per country to account for possible deficiencies in the health care system. John Hopkins University has listed several diseases such as heart disease and Diabetes which are known to be exacerbated by Obesity. Our idea is that we can more accurately predict the Mortality Ratio of COVID-19 by using both population 65 and over and Obesity rather than just population 65 and over. This may show that creating a healthier population is the best way to prevent the devastation in future pandemics that the world is currently facing
# os to manipulate files
import os
# Importing pandas to work with DataFrames.
import pandas as pd
# Importing numpy to general methods.
import numpy as np
import time
import datetime
from datetime import datetime, date,timedelta
# Importing the matplotlib to create graphics
import matplotlib
import matplotlib.pyplot as plt
%matplotlib inline
matplotlib.style.use('ggplot')
# Import seaborn to better the visualization
import seaborn as sns
import plotly.graph_objs as go
from plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot
import plotly.express as px
# Scipy for statistics
from scipy import stats
from sklearn.metrics import mean_absolute_error,r2_score
from sklearn.model_selection import train_test_split
from sklearn import linear_model
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
from scipy import integrate, optimize
# ML libraries
import lightgbm as lgb
import xgboost as xgb
from xgboost import plot_importance, plot_tree
from sklearn.model_selection import RandomizedSearchCV, GridSearchCV
from sklearn import preprocessing, svm
from sklearn import linear_model
from sklearn.metrics import mean_squared_error,explained_variance_score
import sklearn
import matplotlib.dates as dates
import seaborn as seabornInstance
from sklearn.linear_model import LinearRegression
from sklearn import metrics
from scipy.stats import zscore
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import cross_val_score
from matplotlib import rcParams
#sns.set()
#sns.set_context('talk')
import warnings
warnings.filterwarnings('ignore')
# We'll download this file using the urlretrieve function from the urllib.request module.
from urllib.request import urlretrieve
urlretrieve('https://covid.ourworldindata.org/data/owid-covid-data.csv','owid-covid-data.csv')
('owid-covid-data.csv', <http.client.HTTPMessage at 0x205adf36f10>)
#Read data from a CSV file into a Pandas DataFrame object
world_covid19_df = pd.read_csv('owid-covid-data.csv')
owidcovidcodebook=pd.read_csv('https://raw.githubusercontent.com/owid/covid-19-data/master/public/data/owid-covid-codebook.csv',index_col=0)
owidcovidcodebook
| source | description | |
|---|---|---|
| column | ||
| iso_code | International Organization for Standardization | ISO 3166-1 alpha-3 – three-letter country codes |
| continent | Our World in Data | Continent of the geographical location |
| location | Our World in Data | Geographical location |
| date | Our World in Data | Date of observation |
| total_cases | COVID-19 Data Repository by the Center for Sys... | Total confirmed cases of COVID-19 |
| new_cases | COVID-19 Data Repository by the Center for Sys... | New confirmed cases of COVID-19 |
| new_cases_smoothed | COVID-19 Data Repository by the Center for Sys... | New confirmed cases of COVID-19 (7-day smoothed) |
| total_deaths | COVID-19 Data Repository by the Center for Sys... | Total deaths attributed to COVID-19 |
| new_deaths | COVID-19 Data Repository by the Center for Sys... | New deaths attributed to COVID-19 |
| new_deaths_smoothed | COVID-19 Data Repository by the Center for Sys... | New deaths attributed to COVID-19 (7-day smoot... |
| total_cases_per_million | COVID-19 Data Repository by the Center for Sys... | Total confirmed cases of COVID-19 per 1,000,00... |
| new_cases_per_million | COVID-19 Data Repository by the Center for Sys... | New confirmed cases of COVID-19 per 1,000,000 ... |
| new_cases_smoothed_per_million | COVID-19 Data Repository by the Center for Sys... | New confirmed cases of COVID-19 (7-day smoothe... |
| total_deaths_per_million | COVID-19 Data Repository by the Center for Sys... | Total deaths attributed to COVID-19 per 1,000,... |
| new_deaths_per_million | COVID-19 Data Repository by the Center for Sys... | New deaths attributed to COVID-19 per 1,000,00... |
| new_deaths_smoothed_per_million | COVID-19 Data Repository by the Center for Sys... | New deaths attributed to COVID-19 (7-day smoot... |
| reproduction_rate | Arroyo Marioli et al. (2020). https://doi.org/... | Real-time estimate of the effective reproducti... |
| icu_patients | European CDC for European countries / UK Gover... | Number of COVID-19 patients in intensive care ... |
| icu_patients_per_million | European CDC for European countries / UK Gover... | Number of COVID-19 patients in intensive care ... |
| hosp_patients | European CDC for European countries / UK Gover... | Number of COVID-19 patients in hospital on a g... |
| hosp_patients_per_million | European CDC for European countries / UK Gover... | Number of COVID-19 patients in hospital on a g... |
| weekly_icu_admissions | European CDC for European countries / UK Gover... | Number of COVID-19 patients newly admitted to ... |
| weekly_icu_admissions_per_million | European CDC for European countries / UK Gover... | Number of COVID-19 patients newly admitted to ... |
| weekly_hosp_admissions | European CDC for European countries / UK Gover... | Number of COVID-19 patients newly admitted to ... |
| weekly_hosp_admissions_per_million | European CDC for European countries / UK Gover... | Number of COVID-19 patients newly admitted to ... |
| total_tests | National government reports | Total tests for COVID-19 |
| new_tests | National government reports | New tests for COVID-19 (only calculated for co... |
| total_tests_per_thousand | National government reports | Total tests for COVID-19 per 1,000 people |
| new_tests_per_thousand | National government reports | New tests for COVID-19 per 1,000 people |
| new_tests_smoothed | National government reports | New tests for COVID-19 (7-day smoothed). For c... |
| new_tests_smoothed_per_thousand | National government reports | New tests for COVID-19 (7-day smoothed) per 1,... |
| positive_rate | National government reports | The share of COVID-19 tests that are positive,... |
| tests_per_case | National government reports | Tests conducted per new confirmed case of COVI... |
| tests_units | National government reports | Units used by the location to report its testi... |
| total_vaccinations | National government reports | Total number of COVID-19 vaccination doses adm... |
| people_vaccinated | National government reports | Total number of people who received at least o... |
| people_fully_vaccinated | National government reports | Total number of people who received all doses ... |
| new_vaccinations | National government reports | New COVID-19 vaccination doses administered (o... |
| new_vaccinations_smoothed | National government reports | New COVID-19 vaccination doses administered (7... |
| total_vaccinations_per_hundred | National government reports | Total number of COVID-19 vaccination doses adm... |
| people_vaccinated_per_hundred | National government reports | Total number of people who received at least o... |
| people_fully_vaccinated_per_hundred | National government reports | Total number of people who received all doses ... |
| new_vaccinations_smoothed_per_million | National government reports | New COVID-19 vaccination doses administered (7... |
| stringency_index | Oxford COVID-19 Government Response Tracker, B... | Government Response Stringency Index: composit... |
| population | United Nations, Department of Economic and Soc... | Population in 2020 |
| population_density | World Bank World Development Indicators, sourc... | Number of people divided by land area, measure... |
| median_age | UN Population Division, World Population Prosp... | Median age of the population, UN projection fo... |
| aged_65_older | World Bank World Development Indicators based ... | Share of the population that is 65 years and o... |
| aged_70_older | United Nations, Department of Economic and Soc... | Share of the population that is 70 years and o... |
| gdp_per_capita | World Bank World Development Indicators, sourc... | Gross domestic product at purchasing power par... |
| extreme_poverty | World Bank World Development Indicators, sourc... | Share of the population living in extreme pove... |
| cardiovasc_death_rate | Global Burden of Disease Collaborative Network... | Death rate from cardiovascular disease in 2017... |
| diabetes_prevalence | World Bank World Development Indicators, sourc... | Diabetes prevalence (% of population aged 20 t... |
| female_smokers | World Bank World Development Indicators, sourc... | Share of women who smoke, most recent year ava... |
| male_smokers | World Bank World Development Indicators, sourc... | Share of men who smoke, most recent year avail... |
| handwashing_facilities | United Nations Statistics Division | Share of the population with basic handwashing... |
| hospital_beds_per_thousand | OECD, Eurostat, World Bank, national governmen... | Hospital beds per 1,000 people, most recent ye... |
| life_expectancy | James C. Riley, Clio Infra, United Nations Pop... | Life expectancy at birth in 2019 |
| human_development_index | United Nations Development Programme (UNDP) | A composite index measuring average achievemen... |
world_covid19_df
| iso_code | continent | location | date | total_cases | new_cases | new_cases_smoothed | total_deaths | new_deaths | new_deaths_smoothed | ... | gdp_per_capita | extreme_poverty | cardiovasc_death_rate | diabetes_prevalence | female_smokers | male_smokers | handwashing_facilities | hospital_beds_per_thousand | life_expectancy | human_development_index | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | AFG | Asia | Afghanistan | 2020-02-24 | 1.0 | 1.0 | NaN | NaN | NaN | NaN | ... | 1803.987 | NaN | 597.029 | 9.59 | NaN | NaN | 37.746 | 0.5 | 64.83 | 0.511 |
| 1 | AFG | Asia | Afghanistan | 2020-02-25 | 1.0 | 0.0 | NaN | NaN | NaN | NaN | ... | 1803.987 | NaN | 597.029 | 9.59 | NaN | NaN | 37.746 | 0.5 | 64.83 | 0.511 |
| 2 | AFG | Asia | Afghanistan | 2020-02-26 | 1.0 | 0.0 | NaN | NaN | NaN | NaN | ... | 1803.987 | NaN | 597.029 | 9.59 | NaN | NaN | 37.746 | 0.5 | 64.83 | 0.511 |
| 3 | AFG | Asia | Afghanistan | 2020-02-27 | 1.0 | 0.0 | NaN | NaN | NaN | NaN | ... | 1803.987 | NaN | 597.029 | 9.59 | NaN | NaN | 37.746 | 0.5 | 64.83 | 0.511 |
| 4 | AFG | Asia | Afghanistan | 2020-02-28 | 1.0 | 0.0 | NaN | NaN | NaN | NaN | ... | 1803.987 | NaN | 597.029 | 9.59 | NaN | NaN | 37.746 | 0.5 | 64.83 | 0.511 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 76210 | ZWE | Africa | Zimbabwe | 2021-03-16 | 36535.0 | 31.0 | 30.571 | 1507.0 | 3.0 | 2.571 | ... | 1899.775 | 21.4 | 307.846 | 1.82 | 1.6 | 30.7 | 36.791 | 1.7 | 61.49 | 0.571 |
| 76211 | ZWE | Africa | Zimbabwe | 2021-03-17 | 36552.0 | 17.0 | 30.143 | 1508.0 | 1.0 | 2.714 | ... | 1899.775 | 21.4 | 307.846 | 1.82 | 1.6 | 30.7 | 36.791 | 1.7 | 61.49 | 0.571 |
| 76212 | ZWE | Africa | Zimbabwe | 2021-03-18 | 36611.0 | 59.0 | 33.429 | 1509.0 | 1.0 | 2.429 | ... | 1899.775 | 21.4 | 307.846 | 1.82 | 1.6 | 30.7 | 36.791 | 1.7 | 61.49 | 0.571 |
| 76213 | ZWE | Africa | Zimbabwe | 2021-03-19 | 36652.0 | 41.0 | 32.714 | 1510.0 | 1.0 | 2.000 | ... | 1899.775 | 21.4 | 307.846 | 1.82 | 1.6 | 30.7 | 36.791 | 1.7 | 61.49 | 0.571 |
| 76214 | ZWE | Africa | Zimbabwe | 2021-03-20 | 36662.0 | 10.0 | 27.286 | 1510.0 | 0.0 | 1.286 | ... | 1899.775 | 21.4 | 307.846 | 1.82 | 1.6 | 30.7 | 36.791 | 1.7 | 61.49 | 0.571 |
76215 rows × 59 columns
Data from the file is read and stored in a DataFrame object - one of the core data structures in Pandas for storing and working with tabular data. We typically use the _df suffix in the variable names for dataframes.
type(world_covid19_df)
pandas.core.frame.DataFrame
#Get the number of rows & columns as a tuple
world_covid19_df.shape
(76215, 59)
#View basic infomation about rows, columns & data types
world_covid19_df.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 76215 entries, 0 to 76214 Data columns (total 59 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 iso_code 76215 non-null object 1 continent 72473 non-null object 2 location 76215 non-null object 3 date 76215 non-null object 4 total_cases 74913 non-null float64 5 new_cases 74911 non-null float64 6 new_cases_smoothed 73910 non-null float64 7 total_deaths 65624 non-null float64 8 new_deaths 65782 non-null float64 9 new_deaths_smoothed 73910 non-null float64 10 total_cases_per_million 74505 non-null float64 11 new_cases_per_million 74503 non-null float64 12 new_cases_smoothed_per_million 73507 non-null float64 13 total_deaths_per_million 65229 non-null float64 14 new_deaths_per_million 65387 non-null float64 15 new_deaths_smoothed_per_million 73507 non-null float64 16 reproduction_rate 61298 non-null float64 17 icu_patients 7930 non-null float64 18 icu_patients_per_million 7930 non-null float64 19 hosp_patients 9550 non-null float64 20 hosp_patients_per_million 9550 non-null float64 21 weekly_icu_admissions 697 non-null float64 22 weekly_icu_admissions_per_million 697 non-null float64 23 weekly_hosp_admissions 1224 non-null float64 24 weekly_hosp_admissions_per_million 1224 non-null float64 25 new_tests 34582 non-null float64 26 total_tests 34364 non-null float64 27 total_tests_per_thousand 34364 non-null float64 28 new_tests_per_thousand 34582 non-null float64 29 new_tests_smoothed 39573 non-null float64 30 new_tests_smoothed_per_thousand 39573 non-null float64 31 positive_rate 38264 non-null float64 32 tests_per_case 37653 non-null float64 33 tests_units 40920 non-null object 34 total_vaccinations 4978 non-null float64 35 people_vaccinated 4471 non-null float64 36 people_fully_vaccinated 2996 non-null float64 37 new_vaccinations 4246 non-null float64 38 new_vaccinations_smoothed 7571 non-null float64 39 total_vaccinations_per_hundred 4978 non-null float64 40 people_vaccinated_per_hundred 4471 non-null float64 41 people_fully_vaccinated_per_hundred 2996 non-null float64 42 new_vaccinations_smoothed_per_million 7571 non-null float64 43 stringency_index 65065 non-null float64 44 population 75798 non-null float64 45 population_density 71134 non-null float64 46 median_age 69108 non-null float64 47 aged_65_older 68314 non-null float64 48 aged_70_older 68719 non-null float64 49 gdp_per_capita 69348 non-null float64 50 extreme_poverty 47317 non-null float64 51 cardiovasc_death_rate 69965 non-null float64 52 diabetes_prevalence 70872 non-null float64 53 female_smokers 54974 non-null float64 54 male_smokers 54166 non-null float64 55 handwashing_facilities 35112 non-null float64 56 hospital_beds_per_thousand 64009 non-null float64 57 life_expectancy 72418 non-null float64 58 human_development_index 69909 non-null float64 dtypes: float64(54), object(5) memory usage: 34.3+ MB
For now, let's assume this was indeed a data entry error. We can use one of the following approaches for dealing with the missing or faulty value:
It is not really logical to delete Nan values but replace with 0, because that would confirm that the result was static because the data is historical and adopts high time series, we cannot replace or delete even the most data in the rows because it is data historical
I'd rather copy from the list than from Pandas Profiling
# ets first handle numerical features with nan value
Numerical_feat = [feature for feature in world_covid19_df.columns if world_covid19_df[feature].dtypes != 'O']
print('Total numerical features: ', len(Numerical_feat))
print('\nNumerical Features: ', Numerical_feat)
Total numerical features: 54 Numerical Features: ['total_cases', 'new_cases', 'new_cases_smoothed', 'total_deaths', 'new_deaths', 'new_deaths_smoothed', 'total_cases_per_million', 'new_cases_per_million', 'new_cases_smoothed_per_million', 'total_deaths_per_million', 'new_deaths_per_million', 'new_deaths_smoothed_per_million', 'reproduction_rate', 'icu_patients', 'icu_patients_per_million', 'hosp_patients', 'hosp_patients_per_million', 'weekly_icu_admissions', 'weekly_icu_admissions_per_million', 'weekly_hosp_admissions', 'weekly_hosp_admissions_per_million', 'new_tests', 'total_tests', 'total_tests_per_thousand', 'new_tests_per_thousand', 'new_tests_smoothed', 'new_tests_smoothed_per_thousand', 'positive_rate', 'tests_per_case', 'total_vaccinations', 'people_vaccinated', 'people_fully_vaccinated', 'new_vaccinations', 'new_vaccinations_smoothed', 'total_vaccinations_per_hundred', 'people_vaccinated_per_hundred', 'people_fully_vaccinated_per_hundred', 'new_vaccinations_smoothed_per_million', 'stringency_index', 'population', 'population_density', 'median_age', 'aged_65_older', 'aged_70_older', 'gdp_per_capita', 'extreme_poverty', 'cardiovasc_death_rate', 'diabetes_prevalence', 'female_smokers', 'male_smokers', 'handwashing_facilities', 'hospital_beds_per_thousand', 'life_expectancy', 'human_development_index']
# categorical features
categorical_feat = [feature for feature in world_covid19_df.columns if world_covid19_df[feature].dtypes=='O']
print('Total categorical features: ', len(categorical_feat))
print('\n',categorical_feat)
Total categorical features: 5 ['iso_code', 'continent', 'location', 'date', 'tests_units']
## Replacing the numerical Missing Values
for feature in Numerical_feat:
## We will replace by using median since there are outliers
world_covid19_df[feature].fillna(0,inplace=True)
world_covid19_df[Numerical_feat].isnull().sum()
total_cases 0 new_cases 0 new_cases_smoothed 0 total_deaths 0 new_deaths 0 new_deaths_smoothed 0 total_cases_per_million 0 new_cases_per_million 0 new_cases_smoothed_per_million 0 total_deaths_per_million 0 new_deaths_per_million 0 new_deaths_smoothed_per_million 0 reproduction_rate 0 icu_patients 0 icu_patients_per_million 0 hosp_patients 0 hosp_patients_per_million 0 weekly_icu_admissions 0 weekly_icu_admissions_per_million 0 weekly_hosp_admissions 0 weekly_hosp_admissions_per_million 0 new_tests 0 total_tests 0 total_tests_per_thousand 0 new_tests_per_thousand 0 new_tests_smoothed 0 new_tests_smoothed_per_thousand 0 positive_rate 0 tests_per_case 0 total_vaccinations 0 people_vaccinated 0 people_fully_vaccinated 0 new_vaccinations 0 new_vaccinations_smoothed 0 total_vaccinations_per_hundred 0 people_vaccinated_per_hundred 0 people_fully_vaccinated_per_hundred 0 new_vaccinations_smoothed_per_million 0 stringency_index 0 population 0 population_density 0 median_age 0 aged_65_older 0 aged_70_older 0 gdp_per_capita 0 extreme_poverty 0 cardiovasc_death_rate 0 diabetes_prevalence 0 female_smokers 0 male_smokers 0 handwashing_facilities 0 hospital_beds_per_thousand 0 life_expectancy 0 human_development_index 0 dtype: int64
world_covid19_df
| iso_code | continent | location | date | total_cases | new_cases | new_cases_smoothed | total_deaths | new_deaths | new_deaths_smoothed | ... | gdp_per_capita | extreme_poverty | cardiovasc_death_rate | diabetes_prevalence | female_smokers | male_smokers | handwashing_facilities | hospital_beds_per_thousand | life_expectancy | human_development_index | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | AFG | Asia | Afghanistan | 2020-02-24 | 1.0 | 1.0 | 0.000 | 0.0 | 0.0 | 0.000 | ... | 1803.987 | 0.0 | 597.029 | 9.59 | 0.0 | 0.0 | 37.746 | 0.5 | 64.83 | 0.511 |
| 1 | AFG | Asia | Afghanistan | 2020-02-25 | 1.0 | 0.0 | 0.000 | 0.0 | 0.0 | 0.000 | ... | 1803.987 | 0.0 | 597.029 | 9.59 | 0.0 | 0.0 | 37.746 | 0.5 | 64.83 | 0.511 |
| 2 | AFG | Asia | Afghanistan | 2020-02-26 | 1.0 | 0.0 | 0.000 | 0.0 | 0.0 | 0.000 | ... | 1803.987 | 0.0 | 597.029 | 9.59 | 0.0 | 0.0 | 37.746 | 0.5 | 64.83 | 0.511 |
| 3 | AFG | Asia | Afghanistan | 2020-02-27 | 1.0 | 0.0 | 0.000 | 0.0 | 0.0 | 0.000 | ... | 1803.987 | 0.0 | 597.029 | 9.59 | 0.0 | 0.0 | 37.746 | 0.5 | 64.83 | 0.511 |
| 4 | AFG | Asia | Afghanistan | 2020-02-28 | 1.0 | 0.0 | 0.000 | 0.0 | 0.0 | 0.000 | ... | 1803.987 | 0.0 | 597.029 | 9.59 | 0.0 | 0.0 | 37.746 | 0.5 | 64.83 | 0.511 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 76210 | ZWE | Africa | Zimbabwe | 2021-03-16 | 36535.0 | 31.0 | 30.571 | 1507.0 | 3.0 | 2.571 | ... | 1899.775 | 21.4 | 307.846 | 1.82 | 1.6 | 30.7 | 36.791 | 1.7 | 61.49 | 0.571 |
| 76211 | ZWE | Africa | Zimbabwe | 2021-03-17 | 36552.0 | 17.0 | 30.143 | 1508.0 | 1.0 | 2.714 | ... | 1899.775 | 21.4 | 307.846 | 1.82 | 1.6 | 30.7 | 36.791 | 1.7 | 61.49 | 0.571 |
| 76212 | ZWE | Africa | Zimbabwe | 2021-03-18 | 36611.0 | 59.0 | 33.429 | 1509.0 | 1.0 | 2.429 | ... | 1899.775 | 21.4 | 307.846 | 1.82 | 1.6 | 30.7 | 36.791 | 1.7 | 61.49 | 0.571 |
| 76213 | ZWE | Africa | Zimbabwe | 2021-03-19 | 36652.0 | 41.0 | 32.714 | 1510.0 | 1.0 | 2.000 | ... | 1899.775 | 21.4 | 307.846 | 1.82 | 1.6 | 30.7 | 36.791 | 1.7 | 61.49 | 0.571 |
| 76214 | ZWE | Africa | Zimbabwe | 2021-03-20 | 36662.0 | 10.0 | 27.286 | 1510.0 | 0.0 | 1.286 | ... | 1899.775 | 21.4 | 307.846 | 1.82 | 1.6 | 30.7 | 36.791 | 1.7 | 61.49 | 0.571 |
76215 rows × 59 columns
#Store the clean DataFrame in a CSV file
world_covid19_df.to_csv('covid19_df_master.csv',index=False)
covid_df=pd.read_csv('covid19_df_master.csv')
#covid_df.hist(figsize=(15,15));
It appears that each column contains values of a specific data type. For the numeric columns, you can view the some statistical information like mean, standard deviation, minimum/maximum values and number of non-empty values using the .describe method
covid_df.describe().style.background_gradient(cmap="CMRmap_r")
| total_cases | new_cases | new_cases_smoothed | total_deaths | new_deaths | new_deaths_smoothed | total_cases_per_million | new_cases_per_million | new_cases_smoothed_per_million | total_deaths_per_million | new_deaths_per_million | new_deaths_smoothed_per_million | reproduction_rate | icu_patients | icu_patients_per_million | hosp_patients | hosp_patients_per_million | weekly_icu_admissions | weekly_icu_admissions_per_million | weekly_hosp_admissions | weekly_hosp_admissions_per_million | new_tests | total_tests | total_tests_per_thousand | new_tests_per_thousand | new_tests_smoothed | new_tests_smoothed_per_thousand | positive_rate | tests_per_case | total_vaccinations | people_vaccinated | people_fully_vaccinated | new_vaccinations | new_vaccinations_smoothed | total_vaccinations_per_hundred | people_vaccinated_per_hundred | people_fully_vaccinated_per_hundred | new_vaccinations_smoothed_per_million | stringency_index | population | population_density | median_age | aged_65_older | aged_70_older | gdp_per_capita | extreme_poverty | cardiovasc_death_rate | diabetes_prevalence | female_smokers | male_smokers | handwashing_facilities | hospital_beds_per_thousand | life_expectancy | human_development_index | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| count | 76215.000000 | 76215.000000 | 76215.000000 | 76215.000000 | 76215.000000 | 76215.000000 | 76215.000000 | 76215.000000 | 76215.000000 | 76215.000000 | 76215.000000 | 76215.000000 | 76215.000000 | 76215.000000 | 76215.000000 | 76215.000000 | 76215.000000 | 76215.000000 | 76215.000000 | 76215.000000 | 76215.000000 | 76215.000000 | 76215.000000 | 76215.000000 | 76215.000000 | 76215.000000 | 76215.000000 | 76215.000000 | 76215.000000 | 76215.000000 | 76215.000000 | 76215.000000 | 76215.000000 | 76215.000000 | 76215.000000 | 76215.000000 | 76215.000000 | 76215.000000 | 76215.000000 | 76215.000000 | 76215.000000 | 76215.000000 | 76215.000000 | 76215.000000 | 76215.000000 | 76215.000000 | 76215.000000 | 76215.000000 | 76215.000000 | 76215.000000 | 76215.000000 | 76215.000000 | 76215.000000 | 76215.000000 |
| mean | 664429.456551 | 5129.566568 | 5063.112630 | 17076.098655 | 114.393531 | 113.178646 | 7873.626153 | 66.505456 | 65.390387 | 160.782550 | 1.209663 | 1.194381 | 0.817522 | 104.170898 | 2.427559 | 581.768497 | 19.617366 | 2.352516 | 0.175219 | 60.790989 | 1.757699 | 17638.245293 | 2187700.430243 | 80.015992 | 0.720085 | 19339.450148 | 0.782154 | 0.044136 | 80.619000 | 503611.229863 | 314467.331864 | 100983.102303 | 16332.660697 | 16596.512301 | 0.524075 | 0.339373 | 0.126228 | 243.715712 | 50.208742 | 129789029.585436 | 312.872929 | 27.700474 | 7.880594 | 5.023743 | 17411.789829 | 8.249080 | 236.141847 | 7.253194 | 7.601972 | 23.205725 | 23.489918 | 2.548185 | 69.505214 | 0.667534 |
| std | 4698196.905872 | 32191.869926 | 31633.387699 | 108136.089442 | 674.709265 | 652.256822 | 15662.046239 | 168.372816 | 141.179154 | 315.421600 | 3.623517 | 2.756024 | 0.509057 | 1038.799768 | 10.996649 | 4734.364944 | 85.827497 | 57.094248 | 3.765821 | 1554.711563 | 32.119539 | 102929.078224 | 15326886.721775 | 265.393666 | 3.099126 | 100759.298246 | 2.652978 | 0.082895 | 624.528191 | 7953373.735704 | 4766946.098613 | 1766444.270360 | 256803.603832 | 218893.084722 | 4.590345 | 2.888190 | 1.552936 | 1490.438593 | 29.046831 | 694541376.973729 | 1579.306203 | 12.419133 | 6.482936 | 4.370634 | 19609.485356 | 16.968226 | 133.738715 | 4.294111 | 10.021965 | 18.657876 | 33.329193 | 2.519296 | 17.534691 | 0.246698 |
| min | 0.000000 | -74347.000000 | -6223.000000 | 0.000000 | -1918.000000 | -232.143000 | 0.000000 | -2153.437000 | -276.825000 | 0.000000 | -76.445000 | -10.921000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | -239172.000000 | 0.000000 | 0.000000 | -23.010000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
| 25% | 673.500000 | 1.000000 | 4.429000 | 10.000000 | 0.000000 | 0.000000 | 123.578000 | 0.061000 | 0.741000 | 1.603000 | 0.000000 | 0.000000 | 0.520000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 30.560000 | 2540916.000000 | 24.282000 | 19.600000 | 3.008000 | 1.783000 | 2896.913000 | 0.000000 | 140.448000 | 4.610000 | 0.000000 | 0.000000 | 0.000000 | 0.700000 | 66.470000 | 0.555000 |
| 50% | 7821.000000 | 55.000000 | 63.000000 | 130.000000 | 1.000000 | 0.857000 | 1036.897000 | 5.861000 | 7.536000 | 19.611000 | 0.019000 | 0.099000 | 0.960000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 89.000000 | 0.007000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 55.560000 | 10099270.000000 | 80.080000 | 29.000000 | 5.440000 | 3.212000 | 10727.146000 | 0.500000 | 233.070000 | 6.930000 | 2.100000 | 24.500000 | 0.000000 | 2.000000 | 74.160000 | 0.737000 |
| 75% | 87043.000000 | 662.000000 | 666.785500 | 1710.000000 | 11.000000 | 11.571000 | 7189.766500 | 55.940000 | 61.936000 | 139.170000 | 0.804000 | 0.975000 | 1.130000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 3895.000000 | 378435.000000 | 30.874500 | 0.375000 | 5638.500000 | 0.534000 | 0.052000 | 18.200000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 73.150000 | 36910558.000000 | 204.430000 | 38.000000 | 13.260000 | 8.353000 | 25063.846000 | 4.500000 | 318.949000 | 9.750000 | 13.000000 | 37.700000 | 47.782000 | 3.600000 | 78.490000 | 0.828000 |
| max | 122813796.000000 | 880902.000000 | 739564.429000 | 2709639.000000 | 17895.000000 | 14424.000000 | 148592.506000 | 8652.658000 | 2648.773000 | 2327.774000 | 218.329000 | 63.140000 | 6.740000 | 30028.000000 | 189.561000 | 129812.000000 | 1042.535000 | 4037.019000 | 276.325000 | 116385.000000 | 2656.911000 | 2945871.000000 | 355058178.000000 | 3857.663000 | 327.086000 | 1858135.000000 | 59.929000 | 0.742000 | 44258.700000 | 436370147.000000 | 258958639.000000 | 99942889.000000 | 16650022.000000 | 11248913.000000 | 155.330000 | 89.160000 | 66.170000 | 54264.000000 | 100.000000 | 7794798729.000000 | 20546.766000 | 48.200000 | 27.049000 | 18.493000 | 116935.600000 | 77.600000 | 724.417000 | 30.530000 | 44.000000 | 78.100000 | 98.999000 | 13.800000 | 86.750000 | 0.957000 |
While we ahve looked at overall numbers for the cases, tests, positive rate etc., it would be also be useful to study these numbers on a month-by-month basis. The date column might come in handy here, as Pandas provides many utilities for working with dates.
#covid_df['date'] = pd.to_datetime(covid_df.date)
You can see that it now has the datatype datetime64. We can now extract different parts of the data into separate columns, using the DatetimeIndex class
#covid_df['year'] = pd.DatetimeIndex(covid_df.date).year
#covid_df['month'] = pd.DatetimeIndex(covid_df.date).month
#covid_df['day'] = pd.DatetimeIndex(covid_df.date).day
#covid_df['weekday'] = pd.DatetimeIndex(covid_df.date).weekday
covid_df.head(10)
| iso_code | continent | location | date | total_cases | new_cases | new_cases_smoothed | total_deaths | new_deaths | new_deaths_smoothed | ... | gdp_per_capita | extreme_poverty | cardiovasc_death_rate | diabetes_prevalence | female_smokers | male_smokers | handwashing_facilities | hospital_beds_per_thousand | life_expectancy | human_development_index | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | AFG | Asia | Afghanistan | 2020-02-24 | 1.0 | 1.0 | 0.000 | 0.0 | 0.0 | 0.0 | ... | 1803.987 | 0.0 | 597.029 | 9.59 | 0.0 | 0.0 | 37.746 | 0.5 | 64.83 | 0.511 |
| 1 | AFG | Asia | Afghanistan | 2020-02-25 | 1.0 | 0.0 | 0.000 | 0.0 | 0.0 | 0.0 | ... | 1803.987 | 0.0 | 597.029 | 9.59 | 0.0 | 0.0 | 37.746 | 0.5 | 64.83 | 0.511 |
| 2 | AFG | Asia | Afghanistan | 2020-02-26 | 1.0 | 0.0 | 0.000 | 0.0 | 0.0 | 0.0 | ... | 1803.987 | 0.0 | 597.029 | 9.59 | 0.0 | 0.0 | 37.746 | 0.5 | 64.83 | 0.511 |
| 3 | AFG | Asia | Afghanistan | 2020-02-27 | 1.0 | 0.0 | 0.000 | 0.0 | 0.0 | 0.0 | ... | 1803.987 | 0.0 | 597.029 | 9.59 | 0.0 | 0.0 | 37.746 | 0.5 | 64.83 | 0.511 |
| 4 | AFG | Asia | Afghanistan | 2020-02-28 | 1.0 | 0.0 | 0.000 | 0.0 | 0.0 | 0.0 | ... | 1803.987 | 0.0 | 597.029 | 9.59 | 0.0 | 0.0 | 37.746 | 0.5 | 64.83 | 0.511 |
| 5 | AFG | Asia | Afghanistan | 2020-02-29 | 1.0 | 0.0 | 0.143 | 0.0 | 0.0 | 0.0 | ... | 1803.987 | 0.0 | 597.029 | 9.59 | 0.0 | 0.0 | 37.746 | 0.5 | 64.83 | 0.511 |
| 6 | AFG | Asia | Afghanistan | 2020-03-01 | 1.0 | 0.0 | 0.143 | 0.0 | 0.0 | 0.0 | ... | 1803.987 | 0.0 | 597.029 | 9.59 | 0.0 | 0.0 | 37.746 | 0.5 | 64.83 | 0.511 |
| 7 | AFG | Asia | Afghanistan | 2020-03-02 | 1.0 | 0.0 | 0.000 | 0.0 | 0.0 | 0.0 | ... | 1803.987 | 0.0 | 597.029 | 9.59 | 0.0 | 0.0 | 37.746 | 0.5 | 64.83 | 0.511 |
| 8 | AFG | Asia | Afghanistan | 2020-03-03 | 2.0 | 1.0 | 0.143 | 0.0 | 0.0 | 0.0 | ... | 1803.987 | 0.0 | 597.029 | 9.59 | 0.0 | 0.0 | 37.746 | 0.5 | 64.83 | 0.511 |
| 9 | AFG | Asia | Afghanistan | 2020-03-04 | 4.0 | 2.0 | 0.429 | 0.0 | 0.0 | 0.0 | ... | 1803.987 | 0.0 | 597.029 | 9.59 | 0.0 | 0.0 | 37.746 | 0.5 | 64.83 | 0.511 |
10 rows × 59 columns
sum(covid_df.duplicated())
0
covid_df.isnull().sum()
iso_code 0 continent 3742 location 0 date 0 total_cases 0 new_cases 0 new_cases_smoothed 0 total_deaths 0 new_deaths 0 new_deaths_smoothed 0 total_cases_per_million 0 new_cases_per_million 0 new_cases_smoothed_per_million 0 total_deaths_per_million 0 new_deaths_per_million 0 new_deaths_smoothed_per_million 0 reproduction_rate 0 icu_patients 0 icu_patients_per_million 0 hosp_patients 0 hosp_patients_per_million 0 weekly_icu_admissions 0 weekly_icu_admissions_per_million 0 weekly_hosp_admissions 0 weekly_hosp_admissions_per_million 0 new_tests 0 total_tests 0 total_tests_per_thousand 0 new_tests_per_thousand 0 new_tests_smoothed 0 new_tests_smoothed_per_thousand 0 positive_rate 0 tests_per_case 0 tests_units 35295 total_vaccinations 0 people_vaccinated 0 people_fully_vaccinated 0 new_vaccinations 0 new_vaccinations_smoothed 0 total_vaccinations_per_hundred 0 people_vaccinated_per_hundred 0 people_fully_vaccinated_per_hundred 0 new_vaccinations_smoothed_per_million 0 stringency_index 0 population 0 population_density 0 median_age 0 aged_65_older 0 aged_70_older 0 gdp_per_capita 0 extreme_poverty 0 cardiovasc_death_rate 0 diabetes_prevalence 0 female_smokers 0 male_smokers 0 handwashing_facilities 0 hospital_beds_per_thousand 0 life_expectancy 0 human_development_index 0 dtype: int64
covid_df.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 76215 entries, 0 to 76214 Data columns (total 59 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 iso_code 76215 non-null object 1 continent 72473 non-null object 2 location 76215 non-null object 3 date 76215 non-null object 4 total_cases 76215 non-null float64 5 new_cases 76215 non-null float64 6 new_cases_smoothed 76215 non-null float64 7 total_deaths 76215 non-null float64 8 new_deaths 76215 non-null float64 9 new_deaths_smoothed 76215 non-null float64 10 total_cases_per_million 76215 non-null float64 11 new_cases_per_million 76215 non-null float64 12 new_cases_smoothed_per_million 76215 non-null float64 13 total_deaths_per_million 76215 non-null float64 14 new_deaths_per_million 76215 non-null float64 15 new_deaths_smoothed_per_million 76215 non-null float64 16 reproduction_rate 76215 non-null float64 17 icu_patients 76215 non-null float64 18 icu_patients_per_million 76215 non-null float64 19 hosp_patients 76215 non-null float64 20 hosp_patients_per_million 76215 non-null float64 21 weekly_icu_admissions 76215 non-null float64 22 weekly_icu_admissions_per_million 76215 non-null float64 23 weekly_hosp_admissions 76215 non-null float64 24 weekly_hosp_admissions_per_million 76215 non-null float64 25 new_tests 76215 non-null float64 26 total_tests 76215 non-null float64 27 total_tests_per_thousand 76215 non-null float64 28 new_tests_per_thousand 76215 non-null float64 29 new_tests_smoothed 76215 non-null float64 30 new_tests_smoothed_per_thousand 76215 non-null float64 31 positive_rate 76215 non-null float64 32 tests_per_case 76215 non-null float64 33 tests_units 40920 non-null object 34 total_vaccinations 76215 non-null float64 35 people_vaccinated 76215 non-null float64 36 people_fully_vaccinated 76215 non-null float64 37 new_vaccinations 76215 non-null float64 38 new_vaccinations_smoothed 76215 non-null float64 39 total_vaccinations_per_hundred 76215 non-null float64 40 people_vaccinated_per_hundred 76215 non-null float64 41 people_fully_vaccinated_per_hundred 76215 non-null float64 42 new_vaccinations_smoothed_per_million 76215 non-null float64 43 stringency_index 76215 non-null float64 44 population 76215 non-null float64 45 population_density 76215 non-null float64 46 median_age 76215 non-null float64 47 aged_65_older 76215 non-null float64 48 aged_70_older 76215 non-null float64 49 gdp_per_capita 76215 non-null float64 50 extreme_poverty 76215 non-null float64 51 cardiovasc_death_rate 76215 non-null float64 52 diabetes_prevalence 76215 non-null float64 53 female_smokers 76215 non-null float64 54 male_smokers 76215 non-null float64 55 handwashing_facilities 76215 non-null float64 56 hospital_beds_per_thousand 76215 non-null float64 57 life_expectancy 76215 non-null float64 58 human_development_index 76215 non-null float64 dtypes: float64(54), object(5) memory usage: 34.3+ MB
covid_df.describe().T.style.background_gradient(cmap="CMRmap_r")
| count | mean | std | min | 25% | 50% | 75% | max | |
|---|---|---|---|---|---|---|---|---|
| total_cases | 76215.000000 | 664429.456551 | 4698196.905872 | 0.000000 | 673.500000 | 7821.000000 | 87043.000000 | 122813796.000000 |
| new_cases | 76215.000000 | 5129.566568 | 32191.869926 | -74347.000000 | 1.000000 | 55.000000 | 662.000000 | 880902.000000 |
| new_cases_smoothed | 76215.000000 | 5063.112630 | 31633.387699 | -6223.000000 | 4.429000 | 63.000000 | 666.785500 | 739564.429000 |
| total_deaths | 76215.000000 | 17076.098655 | 108136.089442 | 0.000000 | 10.000000 | 130.000000 | 1710.000000 | 2709639.000000 |
| new_deaths | 76215.000000 | 114.393531 | 674.709265 | -1918.000000 | 0.000000 | 1.000000 | 11.000000 | 17895.000000 |
| new_deaths_smoothed | 76215.000000 | 113.178646 | 652.256822 | -232.143000 | 0.000000 | 0.857000 | 11.571000 | 14424.000000 |
| total_cases_per_million | 76215.000000 | 7873.626153 | 15662.046239 | 0.000000 | 123.578000 | 1036.897000 | 7189.766500 | 148592.506000 |
| new_cases_per_million | 76215.000000 | 66.505456 | 168.372816 | -2153.437000 | 0.061000 | 5.861000 | 55.940000 | 8652.658000 |
| new_cases_smoothed_per_million | 76215.000000 | 65.390387 | 141.179154 | -276.825000 | 0.741000 | 7.536000 | 61.936000 | 2648.773000 |
| total_deaths_per_million | 76215.000000 | 160.782550 | 315.421600 | 0.000000 | 1.603000 | 19.611000 | 139.170000 | 2327.774000 |
| new_deaths_per_million | 76215.000000 | 1.209663 | 3.623517 | -76.445000 | 0.000000 | 0.019000 | 0.804000 | 218.329000 |
| new_deaths_smoothed_per_million | 76215.000000 | 1.194381 | 2.756024 | -10.921000 | 0.000000 | 0.099000 | 0.975000 | 63.140000 |
| reproduction_rate | 76215.000000 | 0.817522 | 0.509057 | 0.000000 | 0.520000 | 0.960000 | 1.130000 | 6.740000 |
| icu_patients | 76215.000000 | 104.170898 | 1038.799768 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 30028.000000 |
| icu_patients_per_million | 76215.000000 | 2.427559 | 10.996649 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 189.561000 |
| hosp_patients | 76215.000000 | 581.768497 | 4734.364944 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 129812.000000 |
| hosp_patients_per_million | 76215.000000 | 19.617366 | 85.827497 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 1042.535000 |
| weekly_icu_admissions | 76215.000000 | 2.352516 | 57.094248 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 4037.019000 |
| weekly_icu_admissions_per_million | 76215.000000 | 0.175219 | 3.765821 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 276.325000 |
| weekly_hosp_admissions | 76215.000000 | 60.790989 | 1554.711563 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 116385.000000 |
| weekly_hosp_admissions_per_million | 76215.000000 | 1.757699 | 32.119539 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 2656.911000 |
| new_tests | 76215.000000 | 17638.245293 | 102929.078224 | -239172.000000 | 0.000000 | 0.000000 | 3895.000000 | 2945871.000000 |
| total_tests | 76215.000000 | 2187700.430243 | 15326886.721775 | 0.000000 | 0.000000 | 0.000000 | 378435.000000 | 355058178.000000 |
| total_tests_per_thousand | 76215.000000 | 80.015992 | 265.393666 | 0.000000 | 0.000000 | 0.000000 | 30.874500 | 3857.663000 |
| new_tests_per_thousand | 76215.000000 | 0.720085 | 3.099126 | -23.010000 | 0.000000 | 0.000000 | 0.375000 | 327.086000 |
| new_tests_smoothed | 76215.000000 | 19339.450148 | 100759.298246 | 0.000000 | 0.000000 | 89.000000 | 5638.500000 | 1858135.000000 |
| new_tests_smoothed_per_thousand | 76215.000000 | 0.782154 | 2.652978 | 0.000000 | 0.000000 | 0.007000 | 0.534000 | 59.929000 |
| positive_rate | 76215.000000 | 0.044136 | 0.082895 | 0.000000 | 0.000000 | 0.000000 | 0.052000 | 0.742000 |
| tests_per_case | 76215.000000 | 80.619000 | 624.528191 | 0.000000 | 0.000000 | 0.000000 | 18.200000 | 44258.700000 |
| total_vaccinations | 76215.000000 | 503611.229863 | 7953373.735704 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 436370147.000000 |
| people_vaccinated | 76215.000000 | 314467.331864 | 4766946.098613 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 258958639.000000 |
| people_fully_vaccinated | 76215.000000 | 100983.102303 | 1766444.270360 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 99942889.000000 |
| new_vaccinations | 76215.000000 | 16332.660697 | 256803.603832 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 16650022.000000 |
| new_vaccinations_smoothed | 76215.000000 | 16596.512301 | 218893.084722 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 11248913.000000 |
| total_vaccinations_per_hundred | 76215.000000 | 0.524075 | 4.590345 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 155.330000 |
| people_vaccinated_per_hundred | 76215.000000 | 0.339373 | 2.888190 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 89.160000 |
| people_fully_vaccinated_per_hundred | 76215.000000 | 0.126228 | 1.552936 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 66.170000 |
| new_vaccinations_smoothed_per_million | 76215.000000 | 243.715712 | 1490.438593 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 54264.000000 |
| stringency_index | 76215.000000 | 50.208742 | 29.046831 | 0.000000 | 30.560000 | 55.560000 | 73.150000 | 100.000000 |
| population | 76215.000000 | 129789029.585436 | 694541376.973729 | 0.000000 | 2540916.000000 | 10099270.000000 | 36910558.000000 | 7794798729.000000 |
| population_density | 76215.000000 | 312.872929 | 1579.306203 | 0.000000 | 24.282000 | 80.080000 | 204.430000 | 20546.766000 |
| median_age | 76215.000000 | 27.700474 | 12.419133 | 0.000000 | 19.600000 | 29.000000 | 38.000000 | 48.200000 |
| aged_65_older | 76215.000000 | 7.880594 | 6.482936 | 0.000000 | 3.008000 | 5.440000 | 13.260000 | 27.049000 |
| aged_70_older | 76215.000000 | 5.023743 | 4.370634 | 0.000000 | 1.783000 | 3.212000 | 8.353000 | 18.493000 |
| gdp_per_capita | 76215.000000 | 17411.789829 | 19609.485356 | 0.000000 | 2896.913000 | 10727.146000 | 25063.846000 | 116935.600000 |
| extreme_poverty | 76215.000000 | 8.249080 | 16.968226 | 0.000000 | 0.000000 | 0.500000 | 4.500000 | 77.600000 |
| cardiovasc_death_rate | 76215.000000 | 236.141847 | 133.738715 | 0.000000 | 140.448000 | 233.070000 | 318.949000 | 724.417000 |
| diabetes_prevalence | 76215.000000 | 7.253194 | 4.294111 | 0.000000 | 4.610000 | 6.930000 | 9.750000 | 30.530000 |
| female_smokers | 76215.000000 | 7.601972 | 10.021965 | 0.000000 | 0.000000 | 2.100000 | 13.000000 | 44.000000 |
| male_smokers | 76215.000000 | 23.205725 | 18.657876 | 0.000000 | 0.000000 | 24.500000 | 37.700000 | 78.100000 |
| handwashing_facilities | 76215.000000 | 23.489918 | 33.329193 | 0.000000 | 0.000000 | 0.000000 | 47.782000 | 98.999000 |
| hospital_beds_per_thousand | 76215.000000 | 2.548185 | 2.519296 | 0.000000 | 0.700000 | 2.000000 | 3.600000 | 13.800000 |
| life_expectancy | 76215.000000 | 69.505214 | 17.534691 | 0.000000 | 66.470000 | 74.160000 | 78.490000 | 86.750000 |
| human_development_index | 76215.000000 | 0.667534 | 0.246698 | 0.000000 | 0.555000 | 0.737000 | 0.828000 | 0.957000 |
data_popu=covid_df.groupby('continent').sum()
plt.figure(figsize = (20,18))
sns.set_style('ticks')
#sum countries population in Asia
plt.subplot(221)
sns.barplot(y='location', x='population', data=covid_df[covid_df['continent'] == 'Asia']).set_title('sum countries population in Asia')
#sum countries population in North America
plt.subplot(222)
sns.barplot(y='location', x='population', data=covid_df[covid_df['continent'] == 'North America']).set_title('sum countries population in North America')
#sum countries population in South America
plt.subplot(223)
sns.barplot(y='location', x='population',data=covid_df[covid_df['continent'] == 'South America']).set_title('sum countries population in South America')
#sum countries population in Europe
plt.subplot(224)
sns.set_style('ticks')
sns.barplot(y='location', x='population', data=covid_df[covid_df['continent'] == 'Europe']).set_title('sum countries population in Europe')
plt.subplots_adjust(left=0.1,
bottom=0.1,
right=0.9,
top=0.9,
wspace=0.4,
hspace=0.2)
plt.show();
#sum countries population in Africa
plt.figure(figsize = (20,18))
plt.subplot(221)
sns.set_style('ticks')
#sns.set(rc={'figure.figsize':(11.7,8.27)})
sns.barplot(y='location', x='population', data=covid_df[covid_df['continent'] == 'Africa']).set_title('sum countries population in Africa')
#sum countries population in Oceania
plt.subplot(222)
sns.barplot(y='location', x='population', data=covid_df[covid_df['continent'] == 'Oceania']).set_title('sum countries population in Oceania')
plt.subplots_adjust(left=0.1,
bottom=0.1,
right=0.9,
top=0.9,
wspace=0.4,
hspace=0.2)
plt.show();
#top countries total population in asia
top_popu_asia=covid_df[covid_df['continent'] == 'Asia']
print("The top 10 countries total population in the continent of Asia is :\n",top_popu_asia.groupby(['continent','location'])['population'].max().nlargest(10))
The top 10 countries total population in the continent of Asia is :
continent location
Asia China 1.439324e+09
India 1.380004e+09
Indonesia 2.735236e+08
Pakistan 2.208923e+08
Bangladesh 1.646894e+08
Japan 1.264765e+08
Philippines 1.095811e+08
Vietnam 9.733858e+07
Turkey 8.433907e+07
Iran 8.399295e+07
Name: population, dtype: float64
#top countries total population in North America
top_popu_north_america=covid_df[covid_df['continent'] == 'North America']
print("The top 10 countries total population in the continent of North America is :\n",top_popu_north_america.groupby(['continent','location'])['population'].max().nlargest(10))
The top 10 countries total population in the continent of North America is :
continent location
North America United States 331002647.0
Mexico 128932753.0
Canada 37742157.0
Guatemala 17915567.0
Haiti 11402533.0
Cuba 11326616.0
Dominican Republic 10847904.0
Honduras 9904608.0
Nicaragua 6624554.0
El Salvador 6486201.0
Name: population, dtype: float64
#top countries total population in South America
top_popu_south_america=covid_df[covid_df['continent'] == 'South America']
print("The top 10 countries total population in the continent of South America is :\n",top_popu_south_america.groupby(['continent','location'])['population'].max().nlargest(10))
The top 10 countries total population in the continent of South America is :
continent location
South America Brazil 212559409.0
Colombia 50882884.0
Argentina 45195777.0
Peru 32971846.0
Venezuela 28435943.0
Chile 19116209.0
Ecuador 17643060.0
Bolivia 11673029.0
Paraguay 7132530.0
Uruguay 3473727.0
Name: population, dtype: float64
#top countries total population in Europe
top_popu_europe=covid_df[covid_df['continent'] == 'Europe']
print("The top 10 countries total population in the continent of Europe is :\n",top_popu_europe.groupby(['continent','location'])['population'].max().nlargest(10))
The top 10 countries total population in the continent of Europe is :
continent location
Europe Russia 145934460.0
Germany 83783945.0
France 68147687.0
United Kingdom 67886004.0
Italy 60461828.0
Spain 46754783.0
Ukraine 43733759.0
Poland 37846605.0
Romania 19237682.0
Netherlands 17134873.0
Name: population, dtype: float64
##Show two or more countries total population numbers the min
data=covid_df[covid_df['continent'] == 'Europe']
data.groupby(['continent','location'])['population'].min().nsmallest(10)
continent location
Europe Vatican 809.0
Gibraltar 33691.0
San Marino 33938.0
Liechtenstein 38137.0
Monaco 39244.0
Faeroe Islands 48865.0
Guernsey 67052.0
Andorra 77265.0
Isle of Man 85032.0
Jersey 101073.0
Name: population, dtype: float64
#top countries total population in Europe
top_popu_africa=covid_df[covid_df['continent'] == 'Africa']
print(" The top 10 countries total population in the continent of Africa is :\n",top_popu_africa.groupby(['continent','location'])['population'].max().nlargest(10))
The top 10 countries total population in the continent of Africa is :
continent location
Africa Nigeria 206139587.0
Ethiopia 114963583.0
Egypt 102334403.0
Democratic Republic of Congo 89561404.0
Tanzania 59734213.0
South Africa 59308690.0
Kenya 53771300.0
Uganda 45741000.0
Algeria 43851043.0
Sudan 43849269.0
Name: population, dtype: float64
#top 10 countries total population in Oceania
top_popu_oceania=covid_df[covid_df['continent'] == 'Oceania']
print(" The top 10 countries total population in the continent of Oceania is :\n",top_popu_oceania.groupby(['continent','location'])['population'].max().nlargest(10))
The top 10 countries total population in the continent of Oceania is :
continent location
Oceania Australia 25499881.0
Papua New Guinea 8947027.0
New Zealand 4822233.0
Fiji 896444.0
Solomon Islands 686878.0
Vanuatu 307150.0
Samoa 198410.0
Micronesia (country) 115021.0
Marshall Islands 59194.0
Name: population, dtype: float64
continent_populations_df = covid_df.groupby(['continent'])['population'].sum()
continent_populations_df
continent Africa 5.072744e+11 Asia 1.890449e+12 Europe 3.055237e+11 North America 2.476494e+11 Oceania 1.641354e+10 South America 1.701631e+11 Name: population, dtype: float64
#Show countries in asia the total_cases and total_deaths,new_cases,total_tests, numbers mean, and max
data_total=covid_df[covid_df['continent'] == 'Asia']
data_total.groupby(['continent','location']).agg({'total_cases': ['mean','max'],'total_deaths':['mean','max'],'total_tests':['mean','max'],'total_vaccinations':['mean','max']}).style.background_gradient(cmap="CMRmap_r")
| total_cases | total_deaths | total_tests | total_vaccinations | ||||||
|---|---|---|---|---|---|---|---|---|---|
| mean | max | mean | max | mean | max | mean | max | ||
| continent | location | ||||||||
| Asia | Afghanistan | 32800.465473 | 56093.000000 | 1243.191816 | 2462.000000 | 0.000000 | 0.000000 | 159.079284 | 54000.000000 |
| Armenia | 72477.093506 | 183127.000000 | 1282.075325 | 3332.000000 | 167847.475325 | 788953.000000 | 0.000000 | 0.000000 | |
| Azerbaijan | 81129.828571 | 245490.000000 | 1060.303896 | 3339.000000 | 0.000000 | 0.000000 | 8624.532468 | 453586.000000 | |
| Bahrain | 55488.445013 | 135326.000000 | 202.920716 | 498.000000 | 1134590.184143 | 3368947.000000 | 24745.964194 | 640104.000000 | |
| Bangladesh | 290780.796345 | 568706.000000 | 4216.869452 | 8668.000000 | 1760640.963446 | 4349615.000000 | 166798.676240 | 4760747.000000 | |
| Bhutan | 316.507895 | 869.000000 | 0.189474 | 1.000000 | 159224.386842 | 570591.000000 | 0.000000 | 0.000000 | |
| Brunei | 147.522546 | 205.000000 | 2.477454 | 3.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | |
| Cambodia | 276.434368 | 1578.000000 | 0.028640 | 2.000000 | 0.000000 | 0.000000 | 3161.455847 | 170659.000000 | |
| China | 85702.589623 | 101518.000000 | 4236.117925 | 4839.000000 | 589622.641509 | 160000000.000000 | 816007.075472 | 70000000.000000 | |
| Georgia | 76535.095116 | 277218.000000 | 885.956298 | 3691.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | |
| Hong Kong | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 532269.370000 | 9918161.000000 | 30933.000000 | 330600.000000 | |
| India | 4672001.079327 | 11599130.000000 | 70708.079327 | 159755.000000 | 72672294.795673 | 231370546.000000 | 1949145.658654 | 44603841.000000 | |
| Indonesia | 392300.888021 | 1455788.000000 | 12082.231771 | 39447.000000 | 1865629.171875 | 7781193.000000 | 317826.398438 | 7835357.000000 | |
| Iran | 595969.260101 | 1793805.000000 | 27403.255051 | 61724.000000 | 2319587.229798 | 11844528.000000 | 32.828283 | 10000.000000 | |
| Iraq | 298074.859335 | 789390.000000 | 6821.565217 | 13969.000000 | 1884651.910486 | 7498360.000000 | 0.000000 | 0.000000 | |
| Israel | 239729.043038 | 827220.000000 | 1853.668354 | 6082.000000 | 4214791.891139 | 14437280.000000 | 1153293.881013 | 9686464.000000 | |
| Japan | 115264.412736 | 455212.000000 | 2036.846698 | 8802.000000 | 2125313.466981 | 8633325.000000 | 7521.330189 | 578835.000000 | |
| Jordan | 111707.718016 | 526666.000000 | 1381.872063 | 5788.000000 | 1220804.791123 | 5317747.000000 | 1947.830287 | 241868.000000 | |
| Kazakhstan | 120345.053619 | 281798.000000 | 1556.005362 | 3201.000000 | 3126088.214477 | 8211056.000000 | 627.857909 | 109995.000000 | |
| Kuwait | 89033.342711 | 217933.000000 | 544.427110 | 1215.000000 | 724549.557545 | 1941949.000000 | 1464.194373 | 360000.000000 | |
| Kyrgyzstan | 43689.866848 | 87389.000000 | 862.141304 | 1498.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | |
| Laos | 27.541436 | 49.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 112.519337 | 40732.000000 | |
| Lebanon | 92094.444162 | 436575.000000 | 943.406091 | 5715.000000 | 0.000000 | 0.000000 | 4564.992386 | 135349.000000 | |
| Macao | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 9632.441176 | 27637.000000 | |
| Malaysia | 59239.719715 | 331713.000000 | 282.427553 | 1229.000000 | 1793750.712589 | 7029970.000000 | 9899.733967 | 399525.000000 | |
| Maldives | 8346.089947 | 22373.000000 | 28.335979 | 65.000000 | 171012.134921 | 588650.000000 | 9865.481481 | 212711.000000 | |
| Mongolia | 749.364362 | 4806.000000 | 0.473404 | 4.000000 | 296730.375000 | 2061669.000000 | 3560.569149 | 204121.000000 | |
| Myanmar | 49369.986072 | 142212.000000 | 1093.337047 | 3204.000000 | 595327.573816 | 2482290.000000 | 1069.080780 | 380000.000000 | |
| Nepal | 101298.733967 | 275829.000000 | 735.420428 | 3016.000000 | 831554.337292 | 2218722.000000 | 5401.249406 | 1600000.000000 | |
| Northern Cyprus | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 1774.222222 | 11000.000000 | |
| Oman | 76451.038363 | 149135.000000 | 758.416880 | 1620.000000 | 0.000000 | 0.000000 | 4577.020460 | 109844.000000 | |
| Pakistan | 283551.843590 | 626802.000000 | 5985.046154 | 13843.000000 | 3156972.805128 | 9691087.000000 | 1289.430769 | 350000.000000 | |
| Palestine | 58333.060367 | 221391.000000 | 590.986877 | 2406.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | |
| Philippines | 232215.197115 | 656056.000000 | 4641.959135 | 12930.000000 | 3012351.947115 | 8938938.000000 | 1554.935096 | 240297.000000 | |
| Qatar | 100689.639896 | 173206.000000 | 158.971503 | 272.000000 | 698088.658031 | 1648555.000000 | 3645.077720 | 510000.000000 | |
| Saudi Arabia | 245337.190104 | 384653.000000 | 3584.747396 | 6602.000000 | 5915013.812500 | 14503622.000000 | 106634.174479 | 2999798.000000 | |
| Singapore | 40674.288416 | 60184.000000 | 21.498818 | 30.000000 | 373765.257683 | 8055714.000000 | 9986.940898 | 792423.000000 | |
| South Korea | 29387.498824 | 98665.000000 | 513.983529 | 1696.000000 | 2012652.098824 | 7176600.000000 | 21106.602353 | 676900.000000 | |
| Sri Lanka | 18167.811456 | 89655.000000 | 92.408115 | 544.000000 | 541971.038186 | 2309954.000000 | 48432.090692 | 824523.000000 | |
| Syria | 5446.060440 | 17240.000000 | 324.148352 | 1153.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | |
| Taiwan | 505.732558 | 1005.000000 | 6.051163 | 10.000000 | 85052.120930 | 183386.000000 | 0.000000 | 0.000000 | |
| Tajikistan | 9405.175385 | 13308.000000 | 71.316923 | 90.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | |
| Thailand | 5812.058824 | 27594.000000 | 49.151584 | 90.000000 | 864887.124434 | 2894666.000000 | 215.031674 | 53842.000000 | |
| Timor | 40.447802 | 271.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | |
| Turkey | 835822.845333 | 2992694.000000 | 11075.733333 | 29959.000000 | 12276750.498667 | 35787480.000000 | 982927.338667 | 13029754.000000 | |
| United Arab Emirates | 115633.733813 | 438638.000000 | 428.280576 | 1433.000000 | 10221291.400480 | 34913667.000000 | 768939.309353 | 7181056.000000 | |
| Uzbekistan | 42940.285714 | 81339.000000 | 335.913747 | 622.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | |
| Vietnam | 897.810875 | 2572.000000 | 18.139480 | 35.000000 | 92152.125296 | 1469955.000000 | 353.342790 | 30971.000000 | |
| Yemen | 1629.878261 | 3278.000000 | 460.168116 | 737.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | |
#Show countries in Europe the total_cases and total_deaths,new_cases,total_tests, numbers mean, and max
data_total=covid_df[covid_df['continent'] == 'Europe']
data_total.groupby(['continent','location']).agg({'total_cases': ['mean','max'],'total_deaths':['mean','max'],'total_tests':['mean','max'],'total_vaccinations':['mean','max']}).style.background_gradient(cmap="CMRmap_r")
| total_cases | total_deaths | total_tests | total_vaccinations | ||||||
|---|---|---|---|---|---|---|---|---|---|
| mean | max | mean | max | mean | max | mean | max | ||
| continent | location | ||||||||
| Europe | Albania | 27496.753846 | 120541.000000 | 542.628205 | 2133.000000 | 121225.482051 | 497742.000000 | 358.548718 | 33369.000000 |
| Andorra | 3850.583333 | 11481.000000 | 61.265625 | 113.000000 | 4918.734375 | 162071.000000 | 52.361979 | 4914.000000 | |
| Austria | 143926.628205 | 511440.000000 | 2512.548718 | 9052.000000 | 3093094.464103 | 19277527.000000 | 88327.330769 | 1239208.000000 | |
| Belarus | 105512.888889 | 309293.000000 | 812.855297 | 2148.000000 | 526215.829457 | 5178695.000000 | 131.638243 | 30000.000000 | |
| Belgium | 271334.396594 | 827941.000000 | 11233.172749 | 22650.000000 | 3408751.282238 | 10322890.000000 | 100065.007299 | 1323086.000000 | |
| Bosnia and Herzegovina | 47326.703412 | 151337.000000 | 1662.503937 | 5773.000000 | 213201.230971 | 733032.000000 | 0.000000 | 0.000000 | |
| Bulgaria | 78287.687831 | 302480.000000 | 2912.796296 | 11966.000000 | 529062.817460 | 1915561.000000 | 25370.275132 | 366547.000000 | |
| Croatia | 73024.387179 | 256805.000000 | 1434.812821 | 5753.000000 | 388907.628205 | 1461537.000000 | 8242.951282 | 365082.000000 | |
| Cyprus | 9437.673740 | 41882.000000 | 63.230769 | 242.000000 | 602203.002653 | 3044737.000000 | 1503.787798 | 129438.000000 | |
| Czechia | 323394.974293 | 1459406.000000 | 5140.961440 | 24530.000000 | 0.000000 | 0.000000 | 95911.832905 | 1330675.000000 | |
| Denmark | 62748.303483 | 225540.000000 | 851.870647 | 2400.000000 | 4858681.820896 | 19408441.000000 | 79063.845771 | 935185.000000 | |
| Estonia | 14311.394872 | 94028.000000 | 150.310256 | 780.000000 | 323216.153846 | 1071562.000000 | 12716.071795 | 220882.000000 | |
| Faeroe Islands | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 2280.510204 | 9342.000000 | |
| Finland | 17232.170264 | 71123.000000 | 344.328537 | 805.000000 | 1115747.035971 | 3734426.000000 | 48220.817746 | 812039.000000 | |
| France | 1138253.457346 | 4277183.000000 | 36869.215640 | 92119.000000 | 0.000000 | 0.000000 | 479563.850711 | 7927771.000000 | |
| Germany | 707616.916468 | 2669233.000000 | 18247.847255 | 74706.000000 | 2368382.556086 | 47511887.000000 | 769901.868735 | 10473852.000000 | |
| Gibraltar | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 24921.085714 | 52331.000000 | |
| Greece | 55152.391858 | 235611.000000 | 1747.959288 | 7421.000000 | 1564811.786260 | 5944445.000000 | 97515.226463 | 1436491.000000 | |
| Guernsey | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 2345.041667 | 29383.000000 | |
| Hungary | 123337.568063 | 560971.000000 | 3960.986911 | 18068.000000 | 1059831.206806 | 3751450.000000 | 127689.473822 | 2038133.000000 | |
| Iceland | 3295.149100 | 6097.000000 | 15.313625 | 29.000000 | 132377.637532 | 290437.000000 | 2127.336761 | 52604.000000 | |
| Ireland | 66086.012821 | 229831.000000 | 1903.846154 | 4585.000000 | 1286446.820513 | 3786972.000000 | 44057.589744 | 639586.000000 | |
| Isle of Man | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 15432.666667 | 33336.000000 | |
| Italy | 875866.281928 | 3356331.000000 | 42791.809639 | 104642.000000 | 13117156.293976 | 45894515.000000 | 586145.142169 | 7708889.000000 | |
| Jersey | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 5268.440000 | 45758.000000 | |
| Kosovo | 23996.236559 | 80295.000000 | 642.766129 | 1752.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | |
| Latvia | 18431.976923 | 97149.000000 | 324.702564 | 1821.000000 | 453499.400000 | 1671987.000000 | 7445.574359 | 102320.000000 | |
| Liechtenstein | 788.968586 | 2627.000000 | 13.811518 | 56.000000 | 0.000000 | 0.000000 | 66.479058 | 4215.000000 | |
| Lithuania | 50231.503876 | 208650.000000 | 739.069767 | 3464.000000 | 841210.359173 | 2264247.000000 | 31723.829457 | 399863.000000 | |
| Luxembourg | 19337.785166 | 58955.000000 | 232.685422 | 714.000000 | 815707.519182 | 2280826.000000 | 3365.920716 | 70339.000000 | |
| Malta | 6024.591687 | 27904.000000 | 83.332518 | 369.000000 | 256224.574572 | 772206.000000 | 9488.924205 | 140331.000000 | |
| Moldova | 68266.780423 | 214203.000000 | 1542.140212 | 4531.000000 | 0.000000 | 0.000000 | 268.455026 | 18593.000000 | |
| Monaco | 507.873057 | 2173.000000 | 5.569948 | 27.000000 | 0.000000 | 0.000000 | 93.505181 | 18081.000000 | |
| Montenegro | 22491.146341 | 86782.000000 | 313.325203 | 1194.000000 | 0.000000 | 0.000000 | 140.181572 | 7298.000000 | |
| Netherlands | 337849.726343 | 1211447.000000 | 7574.644501 | 16395.000000 | 365637.345269 | 7184008.000000 | 19897.066496 | 1887726.000000 | |
| North Macedonia | 34666.272494 | 118736.000000 | 1097.457584 | 3448.000000 | 199062.809769 | 578312.000000 | 13.624679 | 5300.000000 | |
| Norway | 24624.683673 | 86362.000000 | 300.262755 | 648.000000 | 1371938.622449 | 4274069.000000 | 57095.441327 | 758514.000000 | |
| Poland | 511944.769634 | 2036700.000000 | 11805.534031 | 49159.000000 | 3690909.712042 | 10639405.000000 | 341900.761780 | 4983494.000000 | |
| Portugal | 217262.483117 | 817080.000000 | 4287.057143 | 16762.000000 | 3046625.184416 | 8671839.000000 | 107622.909091 | 1325266.000000 | |
| Romania | 260677.143959 | 892848.000000 | 7062.737789 | 22132.000000 | 1941957.696658 | 6445769.000000 | 199985.588689 | 2426191.000000 | |
| Russia | 1449070.896386 | 4397816.000000 | 26344.183133 | 93090.000000 | 39430406.373494 | 116724405.000000 | 313259.971084 | 8306498.000000 | |
| San Marino | 1303.175258 | 4356.000000 | 45.201031 | 79.000000 | 0.000000 | 0.000000 | 178.778351 | 7923.000000 | |
| Serbia | 126718.807198 | 546896.000000 | 1370.326478 | 4900.000000 | 1148183.123393 | 3218400.000000 | 135700.277635 | 2163593.000000 | |
| Slovakia | 77540.144737 | 347944.000000 | 1320.644737 | 8978.000000 | 2540943.868421 | 21061465.000000 | 57538.084211 | 718369.000000 | |
| Slovenia | 46344.532688 | 205509.000000 | 963.242131 | 3967.000000 | 407755.222760 | 2313303.000000 | 22672.167070 | 286151.000000 | |
| Spain | 982592.074879 | 3212332.000000 | 32955.731884 | 72910.000000 | 1521440.183575 | 34785710.000000 | 325315.521739 | 5993363.000000 | |
| Sweden | 186260.644928 | 744272.000000 | 5708.398551 | 13262.000000 | 0.000000 | 0.000000 | 24882.821256 | 1293923.000000 | |
| Switzerland | 167805.045024 | 580609.000000 | 3434.443128 | 10203.000000 | 1515279.417062 | 4699813.000000 | 81363.518957 | 1176875.000000 | |
| Ukraine | 444491.861619 | 1584972.000000 | 8394.133159 | 31344.000000 | 1838510.422977 | 7573737.000000 | 1884.511749 | 108310.000000 | |
| United Kingdom | 1142338.734940 | 4304839.000000 | 48199.161446 | 126359.000000 | 25552517.004819 | 107584947.000000 | 2517574.086747 | 28985958.000000 | |
| Vatican | 17.268421 | 27.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | |
#Show countries in Europe the total_cases and total_deaths,new_cases,total_tests, numbers mean, and max
data_total=covid_df[covid_df['continent'] == 'North America']
data_total.groupby(['continent','location']).agg({'total_cases': ['mean','max'],'total_deaths':['mean','max'],'total_tests':['mean','max'],'total_vaccinations':['mean','max']}).style.background_gradient(cmap="CMRmap_r")
| total_cases | total_deaths | total_tests | total_vaccinations | ||||||
|---|---|---|---|---|---|---|---|---|---|
| mean | max | mean | max | mean | max | mean | max | ||
| continent | location | ||||||||
| North America | Anguilla | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 417.727273 | 5348.000000 |
| Antigua and Barbuda | 159.664879 | 1033.000000 | 4.772118 | 28.000000 | 0.000000 | 0.000000 | 133.621984 | 25677.000000 | |
| Bahamas | 3864.724324 | 8800.000000 | 85.605405 | 186.000000 | 0.000000 | 0.000000 | 0.297297 | 110.000000 | |
| Barbados | 562.886179 | 3533.000000 | 9.544715 | 39.000000 | 0.000000 | 0.000000 | 3205.623306 | 58214.000000 | |
| Belize | 4180.584022 | 12400.000000 | 96.738292 | 316.000000 | 0.000000 | 0.000000 | 220.085399 | 15006.000000 | |
| Bermuda | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 2816.246154 | 30481.000000 | |
| Canada | 265005.852381 | 935932.000000 | 9559.716667 | 22635.000000 | 2676569.345238 | 26250445.000000 | 261255.052381 | 3862685.000000 | |
| Cayman Islands | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 4332.476190 | 39145.000000 | |
| Costa Rica | 78812.997368 | 211903.000000 | 1007.471053 | 2896.000000 | 209663.102632 | 623493.000000 | 3394.173684 | 248082.000000 | |
| Cuba | 10733.294118 | 65962.000000 | 124.740642 | 392.000000 | 774317.021390 | 2730305.000000 | 0.000000 | 0.000000 | |
| Dominica | 52.634615 | 156.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 57.052198 | 13565.000000 | |
| Dominican Republic | 99771.244156 | 248989.000000 | 1566.116883 | 3269.000000 | 427707.618182 | 1263135.000000 | 10422.646753 | 675000.000000 | |
| El Salvador | 26734.095368 | 62531.000000 | 780.168937 | 1975.000000 | 162169.149864 | 717882.000000 | 339.269755 | 41512.000000 | |
| Greenland | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 226.882353 | 5130.000000 | |
| Grenada | 54.082418 | 154.000000 | 0.211538 | 1.000000 | 0.000000 | 0.000000 | 32.285714 | 8606.000000 | |
| Guatemala | 73889.419598 | 187659.000000 | 2637.716080 | 6685.000000 | 309333.449749 | 995537.000000 | 980.753769 | 66399.000000 | |
| Haiti | 7147.516393 | 12700.000000 | 163.997268 | 251.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | |
| Honduras | 70863.101333 | 181931.000000 | 1897.346667 | 4430.000000 | 0.000000 | 0.000000 | 201.789333 | 37317.000000 | |
| Jamaica | 7204.192000 | 34665.000000 | 143.296000 | 524.000000 | 48877.738667 | 243363.000000 | 75.186667 | 16096.000000 | |
| Mexico | 670242.629213 | 2193639.000000 | 63141.078652 | 197827.000000 | 1581521.896629 | 5399163.000000 | 264364.887640 | 5459014.000000 | |
| Montserrat | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 51.096774 | 932.000000 | |
| Nicaragua | 3943.362398 | 6582.000000 | 113.700272 | 176.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | |
| Panama | 127367.350133 | 350665.000000 | 2316.992042 | 6042.000000 | 627499.267905 | 2035210.000000 | 12984.485411 | 297165.000000 | |
| Saint Kitts and Nevis | 22.116343 | 44.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 40.119114 | 7580.000000 | |
| Saint Lucia | 501.247312 | 4113.000000 | 5.475806 | 55.000000 | 0.000000 | 0.000000 | 106.801075 | 20247.000000 | |
| Saint Vincent and the Grenadines | 269.379032 | 1696.000000 | 0.919355 | 9.000000 | 0.000000 | 0.000000 | 49.287634 | 9383.000000 | |
| Trinidad and Tobago | 3476.973118 | 7839.000000 | 63.903226 | 140.000000 | 30720.443548 | 103786.000000 | 3.846774 | 991.000000 | |
| Turks and Caicos Islands | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 214.433333 | 6433.000000 | |
| United States | 9004664.099057 | 29785285.000000 | 197045.080189 | 541926.000000 | 113629930.792453 | 355058178.000000 | 8889080.884434 | 121441497.000000 | |
Coronavirus is continuing its spread across the world with almost 100 million confirmed cases in 191 countries and more than two million deaths. and the virus has been detected in nearly every country, as these maps show.
worldwide_spread=covid_df[["continent","location","total_cases","total_tests","date","total_deaths","positive_rate","total_vaccinations","people_fully_vaccinated"]]
df=worldwide_spread.dropna(axis=0)
df.sort_values("total_tests",ascending=False)
df_loc=df.groupby(['location']).max()
df_loc.drop(["date"],axis=1,inplace=True)
df_loc
for i,r in df_loc.iterrows():
if r["total_tests"]>0:
df_loc.loc[i,"test per confirmed(%)"]=(r["total_cases"]/r["total_tests"])*100
df_covid=df_loc.reset_index()
df_covid.style.background_gradient(cmap="CMRmap_r")
| location | continent | total_cases | total_tests | total_deaths | positive_rate | total_vaccinations | people_fully_vaccinated | test per confirmed(%) | |
|---|---|---|---|---|---|---|---|---|---|
| 0 | Afghanistan | Asia | 56093.000000 | 0.000000 | 2462.000000 | 0.000000 | 54000.000000 | 0.000000 | nan |
| 1 | Albania | Europe | 120541.000000 | 497742.000000 | 2133.000000 | 0.409000 | 33369.000000 | 655.000000 | 24.217567 |
| 2 | Algeria | Africa | 116066.000000 | 0.000000 | 3055.000000 | 0.000000 | 75000.000000 | 0.000000 | nan |
| 3 | Andorra | Europe | 11481.000000 | 162071.000000 | 113.000000 | 0.157000 | 4914.000000 | 1264.000000 | 7.083932 |
| 4 | Angola | Africa | 21696.000000 | 0.000000 | 526.000000 | 0.000000 | 49000.000000 | 0.000000 | nan |
| 5 | Anguilla | North America | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 5348.000000 | 0.000000 | nan |
| 6 | Antigua and Barbuda | North America | 1033.000000 | 0.000000 | 28.000000 | 0.000000 | 25677.000000 | 0.000000 | nan |
| 7 | Argentina | South America | 2241739.000000 | 6838489.000000 | 54517.000000 | 0.508000 | 3063864.000000 | 591438.000000 | 32.781204 |
| 8 | Armenia | Asia | 183127.000000 | 788953.000000 | 3332.000000 | 0.504000 | 0.000000 | 0.000000 | 23.211395 |
| 9 | Australia | Oceania | 29196.000000 | 15072203.000000 | 909.000000 | 0.038000 | 253831.000000 | 0.000000 | 0.193708 |
| 10 | Austria | Europe | 511440.000000 | 19277527.000000 | 9052.000000 | 0.247000 | 1239208.000000 | 311203.000000 | 2.653037 |
| 11 | Azerbaijan | Asia | 245490.000000 | 0.000000 | 3339.000000 | 0.000000 | 453586.000000 | 0.000000 | nan |
| 12 | Bahamas | North America | 8800.000000 | 0.000000 | 186.000000 | 0.000000 | 110.000000 | 0.000000 | nan |
| 13 | Bahrain | Asia | 135326.000000 | 3368947.000000 | 498.000000 | 0.079000 | 640104.000000 | 232782.000000 | 4.016863 |
| 14 | Bangladesh | Asia | 568706.000000 | 4349615.000000 | 8668.000000 | 0.243000 | 4760747.000000 | 0.000000 | 13.074858 |
| 15 | Barbados | North America | 3533.000000 | 0.000000 | 39.000000 | 0.000000 | 58214.000000 | 0.000000 | nan |
| 16 | Belarus | Europe | 309293.000000 | 5178695.000000 | 2148.000000 | 0.122000 | 30000.000000 | 10000.000000 | 5.972412 |
| 17 | Belgium | Europe | 827941.000000 | 10322890.000000 | 22650.000000 | 0.327000 | 1323086.000000 | 419430.000000 | 8.020438 |
| 18 | Belize | North America | 12400.000000 | 0.000000 | 316.000000 | 0.000000 | 15006.000000 | 0.000000 | nan |
| 19 | Benin | Africa | 6818.000000 | 0.000000 | 90.000000 | 0.000000 | 0.000000 | 0.000000 | nan |
| 20 | Bermuda | North America | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 30481.000000 | 11674.000000 | nan |
| 21 | Bhutan | Asia | 869.000000 | 570591.000000 | 1.000000 | 0.012000 | 0.000000 | 0.000000 | 0.152298 |
| 22 | Bolivia | South America | 264411.000000 | 815270.000000 | 12051.000000 | 0.636000 | 164984.000000 | 16939.000000 | 32.432323 |
| 23 | Bosnia and Herzegovina | Europe | 151337.000000 | 733032.000000 | 5773.000000 | 0.742000 | 0.000000 | 0.000000 | 20.645347 |
| 24 | Botswana | Africa | 35493.000000 | 0.000000 | 458.000000 | 0.000000 | 0.000000 | 0.000000 | nan |
| 25 | Brazil | South America | 11950459.000000 | 6421441.000000 | 292752.000000 | 0.000000 | 13479165.000000 | 3380095.000000 | 186.102450 |
| 26 | Brunei | Asia | 205.000000 | 0.000000 | 3.000000 | 0.000000 | 0.000000 | 0.000000 | nan |
| 27 | Bulgaria | Europe | 302480.000000 | 1915561.000000 | 11966.000000 | 0.406000 | 366547.000000 | 70753.000000 | 15.790674 |
| 28 | Burkina Faso | Africa | 12516.000000 | 0.000000 | 145.000000 | 0.000000 | 0.000000 | 0.000000 | nan |
| 29 | Burundi | Africa | 2563.000000 | 0.000000 | 3.000000 | 0.000000 | 0.000000 | 0.000000 | nan |
| 30 | Cambodia | Asia | 1578.000000 | 0.000000 | 2.000000 | 0.000000 | 170659.000000 | 0.000000 | nan |
| 31 | Cameroon | Africa | 40622.000000 | 0.000000 | 601.000000 | 0.000000 | 0.000000 | 0.000000 | nan |
| 32 | Canada | North America | 935932.000000 | 26250445.000000 | 22635.000000 | 0.035000 | 3862685.000000 | 626214.000000 | 3.565395 |
| 33 | Cape Verde | Africa | 16440.000000 | 0.000000 | 159.000000 | 0.216000 | 0.000000 | 0.000000 | nan |
| 34 | Cayman Islands | North America | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 39145.000000 | 12824.000000 | nan |
| 35 | Central African Republic | Africa | 5075.000000 | 0.000000 | 64.000000 | 0.000000 | 0.000000 | 0.000000 | nan |
| 36 | Chad | Africa | 4410.000000 | 0.000000 | 157.000000 | 0.000000 | 0.000000 | 0.000000 | nan |
| 37 | Chile | South America | 925089.000000 | 10343273.000000 | 22180.000000 | 0.364000 | 8464110.000000 | 2867453.000000 | 8.943871 |
| 38 | China | Asia | 101518.000000 | 160000000.000000 | 4839.000000 | 0.000000 | 70000000.000000 | 0.000000 | 0.063449 |
| 39 | Colombia | South America | 2331187.000000 | 12059588.000000 | 61907.000000 | 0.332000 | 1131999.000000 | 54162.000000 | 19.330569 |
| 40 | Comoros | Africa | 3666.000000 | 0.000000 | 146.000000 | 0.000000 | 0.000000 | 0.000000 | nan |
| 41 | Congo | Africa | 9564.000000 | 0.000000 | 134.000000 | 0.000000 | 0.000000 | 0.000000 | nan |
| 42 | Costa Rica | North America | 211903.000000 | 623493.000000 | 2896.000000 | 0.426000 | 248082.000000 | 57994.000000 | 33.986428 |
| 43 | Cote d'Ivoire | Africa | 39913.000000 | 470459.000000 | 217.000000 | 0.295000 | 22443.000000 | 0.000000 | 8.483842 |
| 44 | Croatia | Europe | 256805.000000 | 1461537.000000 | 5753.000000 | 0.363000 | 365082.000000 | 76460.000000 | 17.570886 |
| 45 | Cuba | North America | 65962.000000 | 2730305.000000 | 392.000000 | 0.077000 | 0.000000 | 0.000000 | 2.415921 |
| 46 | Cyprus | Europe | 41882.000000 | 3044737.000000 | 242.000000 | 0.060000 | 129438.000000 | 35963.000000 | 1.375554 |
| 47 | Czechia | Europe | 1459406.000000 | 0.000000 | 24530.000000 | 0.321000 | 1330675.000000 | 357083.000000 | nan |
| 48 | Democratic Republic of Congo | Africa | 27468.000000 | 0.000000 | 726.000000 | 0.452000 | 0.000000 | 0.000000 | nan |
| 49 | Denmark | Europe | 225540.000000 | 19408441.000000 | 2400.000000 | 0.195000 | 935185.000000 | 306288.000000 | 1.162072 |
| 50 | Djibouti | Africa | 6518.000000 | 0.000000 | 63.000000 | 0.000000 | 0.000000 | 0.000000 | nan |
| 51 | Dominica | North America | 156.000000 | 0.000000 | 0.000000 | 0.000000 | 13565.000000 | 0.000000 | nan |
| 52 | Dominican Republic | North America | 248989.000000 | 1263135.000000 | 3269.000000 | 0.404000 | 675000.000000 | 0.000000 | 19.711986 |
| 53 | Ecuador | South America | 310868.000000 | 1023967.000000 | 16435.000000 | 0.407000 | 141191.000000 | 20137.000000 | 30.359181 |
| 54 | Egypt | Africa | 194771.000000 | 0.000000 | 11557.000000 | 0.000000 | 1315.000000 | 0.000000 | nan |
| 55 | El Salvador | North America | 62531.000000 | 717882.000000 | 1975.000000 | 0.175000 | 41512.000000 | 0.000000 | 8.710484 |
| 56 | Equatorial Guinea | Africa | 6736.000000 | 0.000000 | 100.000000 | 0.000000 | 6565.000000 | 800.000000 | nan |
| 57 | Eritrea | Africa | 3118.000000 | 0.000000 | 7.000000 | 0.000000 | 0.000000 | 0.000000 | nan |
| 58 | Estonia | Europe | 94028.000000 | 1071562.000000 | 780.000000 | 0.203000 | 220882.000000 | 56946.000000 | 8.774854 |
| 59 | Eswatini | Africa | 17283.000000 | 0.000000 | 665.000000 | 0.000000 | 0.000000 | 0.000000 | nan |
| 60 | Ethiopia | Africa | 185641.000000 | 2256439.000000 | 2647.000000 | 0.216000 | 0.000000 | 0.000000 | 8.227167 |
| 61 | Faeroe Islands | Europe | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 9342.000000 | 4033.000000 | nan |
| 62 | Falkland Islands | South America | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 1732.000000 | 0.000000 | nan |
| 63 | Fiji | Oceania | 67.000000 | 34687.000000 | 2.000000 | 0.381000 | 0.000000 | 0.000000 | 0.193156 |
| 64 | Finland | Europe | 71123.000000 | 3734426.000000 | 805.000000 | 0.118000 | 812039.000000 | 87515.000000 | 1.904523 |
| 65 | France | Europe | 4277183.000000 | 0.000000 | 92119.000000 | 0.159000 | 7927771.000000 | 2297100.000000 | nan |
| 66 | Gabon | Africa | 17711.000000 | 0.000000 | 106.000000 | 0.000000 | 0.000000 | 0.000000 | nan |
| 67 | Gambia | Africa | 5153.000000 | 52362.000000 | 160.000000 | 0.510000 | 0.000000 | 0.000000 | 9.841106 |
| 68 | Georgia | Asia | 277218.000000 | 0.000000 | 3691.000000 | 0.000000 | 0.000000 | 0.000000 | nan |
| 69 | Germany | Europe | 2669233.000000 | 47511887.000000 | 74706.000000 | 0.154000 | 10473852.000000 | 3245985.000000 | 5.618032 |
| 70 | Ghana | Africa | 89276.000000 | 953041.000000 | 716.000000 | 0.319000 | 420000.000000 | 0.000000 | 9.367488 |
| 71 | Gibraltar | Europe | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 52331.000000 | 22293.000000 | nan |
| 72 | Greece | Europe | 235611.000000 | 5944445.000000 | 7421.000000 | 0.105000 | 1436491.000000 | 459446.000000 | 3.963549 |
| 73 | Greenland | North America | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 5130.000000 | 1203.000000 | nan |
| 74 | Grenada | North America | 154.000000 | 0.000000 | 1.000000 | 0.000000 | 8606.000000 | 0.000000 | nan |
| 75 | Guatemala | North America | 187659.000000 | 995537.000000 | 6685.000000 | 0.513000 | 66399.000000 | 0.000000 | 18.850028 |
| 76 | Guernsey | Europe | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 29383.000000 | 7883.000000 | nan |
| 77 | Guinea | Africa | 18562.000000 | 0.000000 | 108.000000 | 0.000000 | 25263.000000 | 0.000000 | nan |
| 78 | Guinea-Bissau | Africa | 3558.000000 | 0.000000 | 55.000000 | 0.000000 | 0.000000 | 0.000000 | nan |
| 79 | Guyana | South America | 9585.000000 | 0.000000 | 214.000000 | 0.000000 | 15524.000000 | 0.000000 | nan |
| 80 | Haiti | North America | 12700.000000 | 0.000000 | 251.000000 | 0.000000 | 0.000000 | 0.000000 | nan |
| 81 | Honduras | North America | 181931.000000 | 0.000000 | 4430.000000 | 0.000000 | 37317.000000 | 0.000000 | nan |
| 82 | Hong Kong | Asia | 0.000000 | 9918161.000000 | 0.000000 | 0.000000 | 330600.000000 | 0.000000 | 0.000000 |
| 83 | Hungary | Europe | 560971.000000 | 3751450.000000 | 18068.000000 | 0.354000 | 2038133.000000 | 474891.000000 | 14.953445 |
| 84 | Iceland | Europe | 6097.000000 | 290437.000000 | 29.000000 | 0.132000 | 52604.000000 | 14739.000000 | 2.099250 |
| 85 | India | Asia | 11599130.000000 | 231370546.000000 | 159755.000000 | 0.127000 | 44603841.000000 | 7478654.000000 | 5.013227 |
| 86 | Indonesia | Asia | 1455788.000000 | 7781193.000000 | 39447.000000 | 0.424000 | 7835357.000000 | 2301978.000000 | 18.709059 |
| 87 | Iran | Asia | 1793805.000000 | 11844528.000000 | 61724.000000 | 0.326000 | 10000.000000 | 0.000000 | 15.144588 |
| 88 | Iraq | Asia | 789390.000000 | 7498360.000000 | 13969.000000 | 0.215000 | 0.000000 | 0.000000 | 10.527502 |
| 89 | Ireland | Europe | 229831.000000 | 3786972.000000 | 4585.000000 | 0.255000 | 639586.000000 | 171258.000000 | 6.068991 |
| 90 | Isle of Man | Europe | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 33336.000000 | 11340.000000 | nan |
| 91 | Israel | Asia | 827220.000000 | 14437280.000000 | 6082.000000 | 0.130000 | 9686464.000000 | 4523828.000000 | 5.729750 |
| 92 | Italy | Europe | 3356331.000000 | 45894515.000000 | 104642.000000 | 0.268000 | 7708889.000000 | 2443394.000000 | 7.313142 |
| 93 | Jamaica | North America | 34665.000000 | 243363.000000 | 524.000000 | 0.290000 | 16096.000000 | 0.000000 | 14.244154 |
| 94 | Japan | Asia | 455212.000000 | 8633325.000000 | 8802.000000 | 0.204000 | 578835.000000 | 25381.000000 | 5.272731 |
| 95 | Jersey | Europe | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 45758.000000 | 5621.000000 | nan |
| 96 | Jordan | Asia | 526666.000000 | 5317747.000000 | 5788.000000 | 0.247000 | 241868.000000 | 52412.000000 | 9.903931 |
| 97 | Kazakhstan | Asia | 281798.000000 | 8211056.000000 | 3201.000000 | 0.228000 | 109995.000000 | 19247.000000 | 3.431934 |
| 98 | Kenya | Africa | 120163.000000 | 1282799.000000 | 1994.000000 | 0.192000 | 20000.000000 | 0.000000 | 9.367251 |
| 99 | Kosovo | Europe | 80295.000000 | 0.000000 | 1752.000000 | 0.000000 | 0.000000 | 0.000000 | nan |
| 100 | Kuwait | Asia | 217933.000000 | 1941949.000000 | 1215.000000 | 0.263000 | 360000.000000 | 38000.000000 | 11.222385 |
| 101 | Kyrgyzstan | Asia | 87389.000000 | 0.000000 | 1498.000000 | 0.000000 | 0.000000 | 0.000000 | nan |
| 102 | Laos | Asia | 49.000000 | 0.000000 | 0.000000 | 0.000000 | 40732.000000 | 0.000000 | nan |
| 103 | Latvia | Europe | 97149.000000 | 1671987.000000 | 1821.000000 | 0.119000 | 102320.000000 | 18936.000000 | 5.810392 |
| 104 | Lebanon | Asia | 436575.000000 | 0.000000 | 5715.000000 | 0.000000 | 135349.000000 | 42752.000000 | nan |
| 105 | Lesotho | Africa | 10535.000000 | 0.000000 | 309.000000 | 0.000000 | 0.000000 | 0.000000 | nan |
| 106 | Liberia | Africa | 2042.000000 | 0.000000 | 85.000000 | 0.000000 | 0.000000 | 0.000000 | nan |
| 107 | Libya | Africa | 150341.000000 | 0.000000 | 2487.000000 | 0.268000 | 0.000000 | 0.000000 | nan |
| 108 | Liechtenstein | Europe | 2627.000000 | 0.000000 | 56.000000 | 0.000000 | 4215.000000 | 0.000000 | nan |
| 109 | Lithuania | Europe | 208650.000000 | 2264247.000000 | 3464.000000 | 0.254000 | 399863.000000 | 124481.000000 | 9.214984 |
| 110 | Luxembourg | Europe | 58955.000000 | 2280826.000000 | 714.000000 | 0.216000 | 70339.000000 | 17469.000000 | 2.584809 |
| 111 | Macao | Asia | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 27637.000000 | 0.000000 | nan |
| 112 | Madagascar | Africa | 22275.000000 | 132187.000000 | 345.000000 | 0.547000 | 0.000000 | 0.000000 | 16.851128 |
| 113 | Malawi | Africa | 33216.000000 | 210730.000000 | 1093.000000 | 0.377000 | 15326.000000 | 0.000000 | 15.762350 |
| 114 | Malaysia | Asia | 331713.000000 | 7029970.000000 | 1229.000000 | 0.163000 | 399525.000000 | 0.000000 | 4.718555 |
| 115 | Maldives | Asia | 22373.000000 | 588650.000000 | 65.000000 | 0.142000 | 212711.000000 | 0.000000 | 3.800730 |
| 116 | Mali | Africa | 9270.000000 | 0.000000 | 367.000000 | 0.000000 | 0.000000 | 0.000000 | nan |
| 117 | Malta | Europe | 27904.000000 | 772206.000000 | 369.000000 | 0.077000 | 140331.000000 | 43267.000000 | 3.613544 |
| 118 | Marshall Islands | Oceania | 4.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | nan |
| 119 | Mauritania | Africa | 17587.000000 | 233820.000000 | 446.000000 | 0.186000 | 0.000000 | 0.000000 | 7.521598 |
| 120 | Mauritius | Africa | 796.000000 | 0.000000 | 10.000000 | 0.000000 | 3843.000000 | 0.000000 | nan |
| 121 | Mexico | North America | 2193639.000000 | 5399163.000000 | 197827.000000 | 0.531000 | 5459014.000000 | 695667.000000 | 40.629242 |
| 122 | Micronesia (country) | Oceania | 1.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | nan |
| 123 | Moldova | Europe | 214203.000000 | 0.000000 | 4531.000000 | 0.000000 | 18593.000000 | 0.000000 | nan |
| 124 | Monaco | Europe | 2173.000000 | 0.000000 | 27.000000 | 0.000000 | 18081.000000 | 8331.000000 | nan |
| 125 | Mongolia | Asia | 4806.000000 | 2061669.000000 | 4.000000 | 0.017000 | 204121.000000 | 0.000000 | 0.233112 |
| 126 | Montenegro | Europe | 86782.000000 | 0.000000 | 1194.000000 | 0.000000 | 7298.000000 | 461.000000 | nan |
| 127 | Montserrat | North America | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 932.000000 | 40.000000 | nan |
| 128 | Morocco | Africa | 491463.000000 | 5347174.000000 | 8763.000000 | 0.265000 | 6687548.000000 | 2423380.000000 | 9.191079 |
| 129 | Mozambique | Africa | 66064.000000 | 458121.000000 | 743.000000 | 0.336000 | 46439.000000 | 0.000000 | 14.420644 |
| 130 | Myanmar | Asia | 142212.000000 | 2482290.000000 | 3204.000000 | 0.218000 | 380000.000000 | 0.000000 | 5.729065 |
| 131 | Namibia | Africa | 42203.000000 | 327300.000000 | 492.000000 | 0.257000 | 0.000000 | 0.000000 | 12.894287 |
| 132 | Nepal | Asia | 275829.000000 | 2218722.000000 | 3016.000000 | 0.252000 | 1600000.000000 | 0.000000 | 12.431886 |
| 133 | Netherlands | Europe | 1211447.000000 | 7184008.000000 | 16395.000000 | 0.292000 | 1887726.000000 | 493123.000000 | 16.863108 |
| 134 | New Zealand | Oceania | 2453.000000 | 1840473.000000 | 26.000000 | 0.040000 | 27000.000000 | 0.000000 | 0.133281 |
| 135 | Nicaragua | North America | 6582.000000 | 0.000000 | 176.000000 | 0.000000 | 0.000000 | 0.000000 | nan |
| 136 | Niger | Africa | 4918.000000 | 0.000000 | 185.000000 | 0.000000 | 0.000000 | 0.000000 | nan |
| 137 | Nigeria | Africa | 161651.000000 | 1684305.000000 | 2030.000000 | 0.298000 | 8000.000000 | 0.000000 | 9.597490 |
| 138 | North Macedonia | Europe | 118736.000000 | 578312.000000 | 3448.000000 | 0.398000 | 5300.000000 | 0.000000 | 20.531478 |
| 139 | Northern Cyprus | Asia | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 11000.000000 | 0.000000 | nan |
| 140 | Norway | Europe | 86362.000000 | 4274069.000000 | 648.000000 | 0.095000 | 758514.000000 | 261503.000000 | 2.020604 |
| 141 | Oman | Asia | 149135.000000 | 0.000000 | 1620.000000 | 0.361000 | 109844.000000 | 19019.000000 | nan |
| 142 | Pakistan | Asia | 626802.000000 | 9691087.000000 | 13843.000000 | 0.256000 | 350000.000000 | 0.000000 | 6.467819 |
| 143 | Palestine | Asia | 221391.000000 | 0.000000 | 2406.000000 | 0.317000 | 0.000000 | 0.000000 | nan |
| 144 | Panama | North America | 350665.000000 | 2035210.000000 | 6042.000000 | 0.414000 | 297165.000000 | 0.000000 | 17.229917 |
| 145 | Papua New Guinea | Oceania | 3085.000000 | 0.000000 | 36.000000 | 0.000000 | 0.000000 | 0.000000 | nan |
| 146 | Paraguay | South America | 192599.000000 | 821942.000000 | 3695.000000 | 0.446000 | 14696.000000 | 0.000000 | 23.432189 |
| 147 | Peru | South America | 1451645.000000 | 4395068.000000 | 49897.000000 | 0.377000 | 623800.000000 | 187613.000000 | 33.028954 |
| 148 | Philippines | Asia | 656056.000000 | 8938938.000000 | 12930.000000 | 0.149000 | 240297.000000 | 0.000000 | 7.339306 |
| 149 | Poland | Europe | 2036700.000000 | 10639405.000000 | 49159.000000 | 0.503000 | 4983494.000000 | 1769770.000000 | 19.142988 |
| 150 | Portugal | Europe | 817080.000000 | 8671839.000000 | 16762.000000 | 0.204000 | 1325266.000000 | 432894.000000 | 9.422223 |
| 151 | Qatar | Asia | 173206.000000 | 1648555.000000 | 272.000000 | 0.393000 | 510000.000000 | 0.000000 | 10.506535 |
| 152 | Romania | Europe | 892848.000000 | 6445769.000000 | 22132.000000 | 0.298000 | 2426191.000000 | 768921.000000 | 13.851691 |
| 153 | Russia | Europe | 4397816.000000 | 116724405.000000 | 93090.000000 | 0.097000 | 8306498.000000 | 2710605.000000 | 3.767692 |
| 154 | Rwanda | Africa | 20761.000000 | 1077055.000000 | 287.000000 | 0.060000 | 329410.000000 | 0.000000 | 1.927571 |
| 155 | Saint Helena | Africa | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 3107.000000 | 0.000000 | nan |
| 156 | Saint Kitts and Nevis | North America | 44.000000 | 0.000000 | 0.000000 | 0.000000 | 7580.000000 | 0.000000 | nan |
| 157 | Saint Lucia | North America | 4113.000000 | 0.000000 | 55.000000 | 0.000000 | 20247.000000 | 0.000000 | nan |
| 158 | Saint Vincent and the Grenadines | North America | 1696.000000 | 0.000000 | 9.000000 | 0.000000 | 9383.000000 | 0.000000 | nan |
| 159 | Samoa | Oceania | 3.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | nan |
| 160 | San Marino | Europe | 4356.000000 | 0.000000 | 79.000000 | 0.000000 | 7923.000000 | 35.000000 | nan |
| 161 | Sao Tome and Principe | Africa | 2142.000000 | 0.000000 | 34.000000 | 0.000000 | 0.000000 | 0.000000 | nan |
| 162 | Saudi Arabia | Asia | 384653.000000 | 14503622.000000 | 6602.000000 | 0.194000 | 2999798.000000 | 0.000000 | 2.652117 |
| 163 | Senegal | Africa | 37693.000000 | 430894.000000 | 1007.000000 | 0.571000 | 150857.000000 | 0.000000 | 8.747627 |
| 164 | Serbia | Europe | 546896.000000 | 3218400.000000 | 4900.000000 | 0.442000 | 2163593.000000 | 858461.000000 | 16.992791 |
| 165 | Seychelles | Africa | 3770.000000 | 0.000000 | 16.000000 | 0.000000 | 90150.000000 | 27693.000000 | nan |
| 166 | Sierra Leone | Africa | 3948.000000 | 0.000000 | 79.000000 | 0.000000 | 0.000000 | 0.000000 | nan |
| 167 | Singapore | Asia | 60184.000000 | 8055714.000000 | 30.000000 | 0.300000 | 792423.000000 | 243169.000000 | 0.747097 |
| 168 | Slovakia | Europe | 347944.000000 | 21061465.000000 | 8978.000000 | 0.178000 | 718369.000000 | 229980.000000 | 1.652041 |
| 169 | Slovenia | Europe | 205509.000000 | 2313303.000000 | 3967.000000 | 0.304000 | 286151.000000 | 103865.000000 | 8.883791 |
| 170 | Solomon Islands | Oceania | 18.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | nan |
| 171 | Somalia | Africa | 9968.000000 | 0.000000 | 419.000000 | 0.000000 | 0.000000 | 0.000000 | nan |
| 172 | South Africa | Africa | 1536801.000000 | 9556404.000000 | 52082.000000 | 0.326000 | 182983.000000 | 182983.000000 | 16.081373 |
| 173 | South Korea | Asia | 98665.000000 | 7176600.000000 | 1696.000000 | 0.049000 | 676900.000000 | 313.000000 | 1.374815 |
| 174 | South Sudan | Africa | 9849.000000 | 124125.000000 | 106.000000 | 0.242000 | 0.000000 | 0.000000 | 7.934743 |
| 175 | Spain | Europe | 3212332.000000 | 34785710.000000 | 72910.000000 | 0.429000 | 5993363.000000 | 1886813.000000 | 9.234631 |
| 176 | Sri Lanka | Asia | 89655.000000 | 2309954.000000 | 544.000000 | 0.096000 | 824523.000000 | 0.000000 | 3.881246 |
| 177 | Sudan | Africa | 30989.000000 | 0.000000 | 1959.000000 | 0.000000 | 0.000000 | 0.000000 | nan |
| 178 | Suriname | South America | 9061.000000 | 0.000000 | 176.000000 | 0.000000 | 11879.000000 | 0.000000 | nan |
| 179 | Sweden | Europe | 744272.000000 | 0.000000 | 13262.000000 | 0.247000 | 1293923.000000 | 383498.000000 | nan |
| 180 | Switzerland | Europe | 580609.000000 | 4699813.000000 | 10203.000000 | 0.270000 | 1176875.000000 | 432194.000000 | 12.353875 |
| 181 | Syria | Asia | 17240.000000 | 0.000000 | 1153.000000 | 0.000000 | 0.000000 | 0.000000 | nan |
| 182 | Taiwan | Asia | 1005.000000 | 183386.000000 | 10.000000 | 0.028000 | 0.000000 | 0.000000 | 0.548024 |
| 183 | Tajikistan | Asia | 13308.000000 | 0.000000 | 90.000000 | 0.000000 | 0.000000 | 0.000000 | nan |
| 184 | Tanzania | Africa | 509.000000 | 0.000000 | 21.000000 | 0.000000 | 0.000000 | 0.000000 | nan |
| 185 | Thailand | Asia | 27594.000000 | 2894666.000000 | 90.000000 | 0.286000 | 53842.000000 | 0.000000 | 0.953271 |
| 186 | Timor | Asia | 271.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | nan |
| 187 | Togo | Africa | 8839.000000 | 258894.000000 | 102.000000 | 0.134000 | 0.000000 | 0.000000 | 3.414139 |
| 188 | Trinidad and Tobago | North America | 7839.000000 | 103786.000000 | 140.000000 | 0.496000 | 991.000000 | 0.000000 | 7.553042 |
| 189 | Tunisia | Africa | 245405.000000 | 225033.000000 | 8526.000000 | 0.341000 | 6861.000000 | 0.000000 | 109.052894 |
| 190 | Turkey | Asia | 2992694.000000 | 35787480.000000 | 29959.000000 | 0.191000 | 13029754.000000 | 5013676.000000 | 8.362405 |
| 191 | Turks and Caicos Islands | North America | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 6433.000000 | 0.000000 | nan |
| 192 | Uganda | Africa | 40651.000000 | 910045.000000 | 334.000000 | 0.162000 | 13027.000000 | 0.000000 | 4.466922 |
| 193 | Ukraine | Europe | 1584972.000000 | 7573737.000000 | 31344.000000 | 0.504000 | 108310.000000 | 1.000000 | 20.927212 |
| 194 | United Arab Emirates | Asia | 438638.000000 | 34913667.000000 | 1433.000000 | 0.024000 | 7181056.000000 | 2187849.000000 | 1.256350 |
| 195 | United Kingdom | Europe | 4304839.000000 | 107584947.000000 | 126359.000000 | 0.300000 | 28985958.000000 | 2132551.000000 | 4.001340 |
| 196 | United States | North America | 29785285.000000 | 355058178.000000 | 541926.000000 | 0.202000 | 121441497.000000 | 43036818.000000 | 8.388846 |
| 197 | Uruguay | South America | 79923.000000 | 1174028.000000 | 776.000000 | 0.129000 | 311282.000000 | 0.000000 | 6.807589 |
| 198 | Uzbekistan | Asia | 81339.000000 | 0.000000 | 622.000000 | 0.000000 | 0.000000 | 0.000000 | nan |
| 199 | Vanuatu | Oceania | 3.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | nan |
| 200 | Vatican | Europe | 27.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | nan |
| 201 | Venezuela | South America | 150306.000000 | 0.000000 | 1483.000000 | 0.000000 | 12194.000000 | 0.000000 | nan |
| 202 | Vietnam | Asia | 2572.000000 | 1469955.000000 | 35.000000 | 0.016000 | 30971.000000 | 0.000000 | 0.174971 |
| 203 | Yemen | Asia | 3278.000000 | 0.000000 | 737.000000 | 0.000000 | 0.000000 | 0.000000 | nan |
| 204 | Zambia | Africa | 86273.000000 | 1180598.000000 | 1178.000000 | 0.301000 | 0.000000 | 0.000000 | 7.307568 |
| 205 | Zimbabwe | Africa | 36662.000000 | 412342.000000 | 1510.000000 | 0.287000 | 42210.000000 | 0.000000 | 8.891163 |
You can click each country and see the number representing the spread of the virus.
fig = px.choropleth(covid_df, locations="location",
color=np.log(covid_df["total_cases"]),
locationmode="country names", hover_name="location",
animation_frame=covid_df["date"],
title='Cases over time', color_continuous_scale=px.colors.sequential.matter)
#fig.update(layout_coloraxis_showscale=False)
fig.show()
def plot_map(df, col, pal):
fig = px.choropleth(df, locations="location", locationmode='country names',
color=col, hover_name="location",
title=col, hover_data=[col], color_continuous_scale=pal)
# fig.update_layout(coloraxis_showscale=False)
fig.show()
covid_deaths=covid_df[["continent","location","total_cases","date","total_deaths","total_deaths_per_million","total_cases_per_million","total_vaccinations"]]
df=covid_deaths.dropna(axis=0)
df_data=df.groupby(['location']).max()
df_data.drop(["date"],axis=1,inplace=True)
df_data.reset_index(inplace=True)
#df_data.drop(index=171,inplace=True)
df_data
df_data[df_data["continent"]=="Africa"].sum()
location AlgeriaAngolaBeninBotswanaBurkina FasoBurundiC... continent AfricaAfricaAfricaAfricaAfricaAfricaAfricaAfri... total_cases 4.09742e+06 total_deaths 109674 total_deaths_per_million 5353.83 total_cases_per_million 287182 total_vaccinations 8.19935e+06 dtype: object
For africa regions, the confirmed cases is lower than other continents, I guess this is due to the fact that number of tests is quite low.
You can click each country and see the number of the total confirmed cases.
plot_map(df_data,'total_cases', 'matter')
We can see US,Brazil and India are distinctive
You can click each country and see the number of the total deaths.
plot_map(df_data,'total_deaths', 'matter')
We can see US,Brazil,Mexico and India are distinctive
You can click each country and see the number of the total deaths per million
plot_map(df_data,'total_deaths_per_million', 'matter')
def plot_hbar(df, col, n, hover_data=[]):
fig = px.bar(df.sort_values(col).tail(n),
x=col, y="location", color='continent',
text=col, orientation='h', width=700, hover_data=hover_data,
color_discrete_sequence = px.colors.qualitative.Dark2)
fig.update_layout(title=col, xaxis_title="", yaxis_title="",
yaxis_categoryorder = 'total ascending',
uniformtext_minsize=8, uniformtext_mode='hide')
fig.show()
plot_hbar(df_data, 'total_cases', 15)
plot_hbar(df_data, 'total_deaths', 15)
plot_hbar(df_data, 'total_deaths_per_million', 15)
plot_hbar(df_covid, "total_tests", 15)
plot_hbar(df_covid,"total_vaccinations", 15)
plot_hbar(df_covid,"people_fully_vaccinated", 15)
We used this technique of data visulizing to display hierarchical data using nested rectangles,And accurately display multiple elements together
def plot_treemap(col):
fig = px.treemap(df_data, path=["location"], values=col, height=700,
title=col, color_discrete_sequence = px.colors.qualitative.Dark2)
fig.data[0].textinfo = 'label+text+value'
fig.show()
def plot_treemap_(col):
fig = px.treemap(df_covid, path=["location"], values=col, height=700,
title=col, color_discrete_sequence = px.colors.qualitative.Dark2)
fig.data[0].textinfo = 'label+text+value'
fig.show()
plot_treemap('total_cases')
plot_treemap('total_deaths')
plot_treemap_('total_tests')
plot_treemap_('test per confirmed(%)')
plot_treemap_('total_vaccinations')
plot_treemap_('people_fully_vaccinated')
covid_df['death_rate'] = (covid_df['new_deaths_smoothed_per_million'] / covid_df['new_cases_smoothed_per_million']).replace(np.inf,np.nan)
covid_df['population_coverage'] = covid_df['total_tests'] / covid_df['population']
trace1 = go.Scatter(
x=covid_df.groupby(['date'])['date'].apply(lambda x: np.unique(x)[0]),
y=covid_df.groupby(['date'])['new_deaths_smoothed_per_million'].mean(),
xaxis='x2',
yaxis='y2',
name = "mean new deaths smoothed per million"
)
trace2 = go.Scatter(
x=covid_df.groupby(['date'])['date'].apply(lambda x: np.unique(x)[0]),
y=covid_df.groupby(['date'])['new_tests_smoothed_per_thousand'].mean(),
name = "mean new tests smoothed per thousand"
)
trace3 = go.Scatter(
x=covid_df.groupby(['date'])['date'].apply(lambda x: np.unique(x)[0]),
y=(covid_df.groupby(['date'])['death_rate'].mean().replace([np.inf],np.nan).interpolate(method='linear', limit_direction='forward', axis=0) * 100).round(3),
xaxis='x3',
yaxis='y3',
name = "interpolated death rate %"
)
trace4 = go.Scatter(
x=covid_df.groupby(['date'])['date'].apply(lambda x: np.unique(x)[0]),
y=((covid_df.groupby(['date'])['new_cases_per_million'].apply(lambda x: np.mean(x/1e+6))) * 100).round(6),
xaxis='x4',
yaxis='y4',
name = "mean covid population d2d coverage %"
)
data = [trace1, trace2, trace3, trace4]
layout = go.Layout(
xaxis=dict(
domain=[0, 0.45]
),
yaxis=dict(
domain=[0, 0.45]
),
xaxis2=dict(
domain=[0.55, 1]
),
xaxis3=dict(
domain=[0, 0.45],
anchor='y3'
),
xaxis4=dict(
domain=[0.55, 1],
anchor='y4'
),
yaxis2=dict(
domain=[0, 0.45],
anchor='x2'
),
yaxis3=dict(
domain=[0.55, 1]
),
yaxis4=dict(
domain=[0.55, 1],
anchor='x4'
),
title = 'Mean new deaths per 1M, new tests per 1K, death rate and covid mean coverage'
)
fig = go.Figure(data=data, layout=layout)
iplot(fig)
We used this technique of data visualization to plot line display day by day trend ,And accurately display multiple elements together
def plot_line(col,title):
trace1 = go.Scatter(
x = covid_df[(covid_df['continent']=='Asia')].groupby(['date','continent'])['date'].apply(lambda x: np.unique(x)[0]),
y = covid_df[(covid_df['continent']=='Asia')].groupby(['date','continent'])[col].sum(),
mode = "lines",
name = "Asia",
marker = dict(color = 'green'),
)
trace2 = go.Scatter(
x = covid_df[(covid_df['continent']=='Europe')].groupby(['date','continent'])['date'].apply(lambda x: np.unique(x)[0]),
y = covid_df[(covid_df['continent']=='Europe')].groupby(['date','continent'])[col].sum(),
mode = "lines",
name = "Europe",
marker = dict(color = 'red'),
)
trace3 = go.Scatter(
x = covid_df[(covid_df['continent']=='Africa')].groupby(['date','continent'])['date'].apply(lambda x: np.unique(x)[0]),
y = covid_df[(covid_df['continent']=='Africa')].groupby(['date','continent'])[col].sum(),
mode = "lines",
name = "Africa",
marker = dict(color = 'blue'),
#text= covid_df.university_name
)
trace4 = go.Scatter(
x = covid_df[(covid_df['continent']=='North America')].groupby(['date','continent'])['date'].apply(lambda x: np.unique(x)[0]),
y = covid_df[(covid_df['continent']=='North America')].groupby(['date','continent'])[col].sum(),
mode = "lines",
name = "North America",
marker = dict(color = 'black'),
)
trace5 = go.Scatter(
x = covid_df[(covid_df['continent']=='South America')].groupby(['date','continent'])['date'].apply(lambda x: np.unique(x)[0]),
y = covid_df[(covid_df['continent']=='South America')].groupby(['date','continent'])[col].sum(),
mode = "lines",
name = "South America",
marker = dict(color = 'brown'),
)
data = [trace1,trace2,trace3,trace4,trace5]
layout = dict(title = title,
xaxis= dict(title= "#{} day by day".format(title),ticklen= 5,zeroline= False)
)
fig = dict(data = data, layout = layout)
iplot(fig)
plot_line('new_deaths_smoothed','New Deaths Smoothed')
plot_line('new_vaccinations_smoothed','new vaccinations smoothed')
plot_line('total_vaccinations','total_vaccinations')
plot_line('new_tests_smoothed','New tests smoothed')
plot_line('positive_rate','positive_rate')
def plot_line_mean(col,title):
trace1 = go.Scatter(
x = covid_df[(covid_df['continent']=='Asia')&(covid_df['date']>='2020-03-01')].groupby(['date','continent'])['date'].apply(lambda x: np.unique(x)[0]),
y = covid_df[(covid_df['continent']=='Asia')&(covid_df['date']>='2020-03-01')].groupby(['date','continent'])[col].mean()*100,
mode = "lines",
name = "Asia",
marker = dict(color = 'green'),
)
trace2 = go.Scatter(
x = covid_df[(covid_df['continent']=='Europe')&(covid_df['date']>='2020-03-01')].groupby(['date','continent'])['date'].apply(lambda x: np.unique(x)[0]),
y = covid_df[(covid_df['continent']=='Europe')&(covid_df['date']>='2020-03-01')].groupby(['date','continent'])[col].mean()*100,
mode = "lines",
name = "Europe",
marker = dict(color = 'red'),
)
trace3 = go.Scatter(
x = covid_df[(covid_df['continent']=='Africa')&(covid_df['date']>='2020-03-01')].groupby(['date','continent'])['date'].apply(lambda x: np.unique(x)[0]),
y = covid_df[(covid_df['continent']=='Africa')&(covid_df['date']>='2020-03-01')].groupby(['date','continent'])[col].mean()*100,
mode = "lines",
name = "Africa",
marker = dict(color = 'blue'),
)
trace4 = go.Scatter(
x = covid_df[(covid_df['continent']=='North America')&(covid_df['date']>='2020-03-01')].groupby(['date','continent'])['date'].apply(lambda x: np.unique(x)[0]),
y = covid_df[(covid_df['continent']=='North America')&(covid_df['date']>='2020-03-01')].groupby(['date','continent'])[col].mean()*100,
mode = "lines",
name = "North America",
marker = dict(color = 'black'),
)
trace5 = go.Scatter(
x = covid_df[(covid_df['continent']=='South America')&(covid_df['date']>='2020-03-01')].groupby(['date','continent'])['date'].apply(lambda x: np.unique(x)[0]),
y = covid_df[(covid_df['continent']=='South America')&(covid_df['date']>='2020-03-01')].groupby(['date','continent'])[col].mean(),
mode = "lines",
name = "South America",
marker = dict(color = 'brown'),
)
data = [trace1,trace2,trace3,trace4,trace5]
layout = dict(title = title,
xaxis= dict(title= 'mean deaths/cases %',ticklen= 5,zeroline= False)
)
fig = dict(data = data, layout = layout)
iplot(fig)
plot_line_mean('death_rate','Mean death rate over continents')
plot_line_mean('population_coverage','Mean population test coverage over continents')
trace1 = go.Scatter(
x = covid_df[(covid_df['continent']=='Asia')&(covid_df['date']>='2020-03-01')].groupby(['date','continent'])['date'].apply(lambda x: np.unique(x)[0]),
y = covid_df[(covid_df['continent']=='Asia')&(covid_df['date']>='2020-03-01')].groupby(['date','continent'])['death_rate'].mean()*100,
mode = "lines",
name = "Asia",
marker = dict(color = 'green'),
)
trace2 = go.Scatter(
x = covid_df[(covid_df['continent']=='Europe')&(covid_df['date']>='2020-03-01')].groupby(['date','continent'])['date'].apply(lambda x: np.unique(x)[0]),
y = covid_df[(covid_df['continent']=='Europe')&(covid_df['date']>='2020-03-01')].groupby(['date','continent'])['population_coverage'].mean()*100,
mode = "lines",
name = "Europe",
marker = dict(color = 'red'),
)
trace3 = go.Scatter(
x = covid_df[(covid_df['continent']=='Africa')&(covid_df['date']>='2020-03-01')].groupby(['date','continent'])['date'].apply(lambda x: np.unique(x)[0]),
y = covid_df[(covid_df['continent']=='Africa')&(covid_df['date']>='2020-03-01')].groupby(['date','continent'])['population_coverage'].mean()*100,
mode = "lines",
name = "Africa",
marker = dict(color = 'blue'),
)
trace4 = go.Scatter(
x = covid_df[(covid_df['continent']=='North America')&(covid_df['date']>='2020-03-01')].groupby(['date','continent'])['date'].apply(lambda x: np.unique(x)[0]),
y = covid_df[(covid_df['continent']=='North America')&(covid_df['date']>='2020-03-01')].groupby(['date','continent'])['population_coverage'].mean()*100,
mode = "lines",
name = "North America",
marker = dict(color = 'black'),
)
trace5 = go.Scatter(
x = covid_df[(covid_df['continent']=='South America')&(covid_df['date']>='2020-03-01')].groupby(['date','continent'])['date'].apply(lambda x: np.unique(x)[0]),
y = covid_df[(covid_df['continent']=='South America')&(covid_df['date']>='2020-03-01')].groupby(['date','continent'])['population_coverage'].mean(),
mode = "lines",
name = "South America",
marker = dict(color = 'brown'),
)
data = [trace1,trace2,trace3,trace4,trace5]
layout = dict(title = 'Mean population test coverage over continents',
xaxis= dict(title= 'mean tests/population %',ticklen= 5,zeroline= False)
)
fig = dict(data = data, layout = layout)
iplot(fig)
covid_df_grouped = covid_df.groupby(['location','continent']).agg({'new_deaths': np.sum, 'gdp_per_capita': np.mean, 'new_cases':np.sum}).reset_index()
covid_df_grouped = covid_df_grouped[(~covid_df_grouped['new_deaths'].isnull())&(~covid_df_grouped['new_cases'].isnull())&(~covid_df_grouped['gdp_per_capita'].isnull())&(~covid_df_grouped['continent'].isnull())]
fig = px.scatter(covid_df_grouped,
x="new_deaths", y="gdp_per_capita", size="new_cases", color="continent",
hover_name="location", log_x=True, size_max=60)
fig.show()
covid_df_grouped = covid_df.groupby(['location','continent']).agg({'handwashing_facilities': np.mean, 'new_deaths_smoothed_per_million': np.sum, 'extreme_poverty':np.mean}).reset_index()
covid_df_grouped = covid_df_grouped[(~covid_df_grouped['handwashing_facilities'].isnull())&(~covid_df_grouped['new_deaths_smoothed_per_million'].isnull())&(~covid_df_grouped['extreme_poverty'].isnull())&(~covid_df_grouped['continent'].isnull())]
fig = px.scatter(covid_df_grouped,
x="new_deaths_smoothed_per_million", y="handwashing_facilities", size="extreme_poverty", color="continent",
hover_name="location", log_x=True, size_max=60)
fig.show()
covid_df_grouped = covid_df.groupby(['location','continent']).agg({'population_density': np.mean, 'new_deaths_smoothed_per_million': np.sum, 'aged_70_older':np.mean}).reset_index()
covid_df_grouped = covid_df_grouped[(~covid_df_grouped['population_density'].isnull())&(~covid_df_grouped['new_deaths_smoothed_per_million'].isnull())&(~covid_df_grouped['aged_70_older'].isnull())&(~covid_df_grouped['continent'].isnull())]
fig = px.scatter(covid_df_grouped,
x="new_deaths_smoothed_per_million", y="aged_70_older", size="population_density", color="continent",
hover_name="location", log_x=True, size_max=60)
fig.show()
covid_df_grouped = covid_df.groupby(['location','continent']).agg({'life_expectancy': np.mean, 'new_deaths_smoothed_per_million': np.sum, 'hospital_beds_per_thousand':np.mean}).reset_index()
covid_df_grouped = covid_df_grouped[(~covid_df_grouped['life_expectancy'].isnull())&(~covid_df_grouped['new_deaths_smoothed_per_million'].isnull())&(~covid_df_grouped['hospital_beds_per_thousand'].isnull())&(~covid_df_grouped['continent'].isnull())]
fig = px.scatter(covid_df_grouped,
x="new_deaths_smoothed_per_million", y="life_expectancy", size="hospital_beds_per_thousand", color="continent",
hover_name="location", log_x=True, size_max=60)
fig.show()
covid_df_grouped = covid_df.groupby(['location','continent']).agg({'death_rate': np.mean, 'stringency_index': np.mean, 'new_cases':np.sum}).reset_index()
covid_df_grouped = covid_df_grouped[(~covid_df_grouped['death_rate'].isnull())&(~covid_df_grouped['stringency_index'].isnull())&(~covid_df_grouped['new_cases'].isnull())&(~covid_df_grouped['continent'].isnull())]
fig = px.scatter(covid_df_grouped,
x="death_rate", y="stringency_index", size="new_cases", color="continent",
hover_name="location", log_x=True, size_max=60)
fig.show()
covid_df_copy = world_covid19_df.copy()
covid_df_copy = covid_df.copy()
covid_df_copy.head(10)
| iso_code | continent | location | date | total_cases | new_cases | new_cases_smoothed | total_deaths | new_deaths | new_deaths_smoothed | ... | cardiovasc_death_rate | diabetes_prevalence | female_smokers | male_smokers | handwashing_facilities | hospital_beds_per_thousand | life_expectancy | human_development_index | death_rate | population_coverage | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | AFG | Asia | Afghanistan | 2020-02-24 | 1.0 | 1.0 | 0.000 | 0.0 | 0.0 | 0.0 | ... | 597.029 | 9.59 | 0.0 | 0.0 | 37.746 | 0.5 | 64.83 | 0.511 | NaN | 0.0 |
| 1 | AFG | Asia | Afghanistan | 2020-02-25 | 1.0 | 0.0 | 0.000 | 0.0 | 0.0 | 0.0 | ... | 597.029 | 9.59 | 0.0 | 0.0 | 37.746 | 0.5 | 64.83 | 0.511 | NaN | 0.0 |
| 2 | AFG | Asia | Afghanistan | 2020-02-26 | 1.0 | 0.0 | 0.000 | 0.0 | 0.0 | 0.0 | ... | 597.029 | 9.59 | 0.0 | 0.0 | 37.746 | 0.5 | 64.83 | 0.511 | NaN | 0.0 |
| 3 | AFG | Asia | Afghanistan | 2020-02-27 | 1.0 | 0.0 | 0.000 | 0.0 | 0.0 | 0.0 | ... | 597.029 | 9.59 | 0.0 | 0.0 | 37.746 | 0.5 | 64.83 | 0.511 | NaN | 0.0 |
| 4 | AFG | Asia | Afghanistan | 2020-02-28 | 1.0 | 0.0 | 0.000 | 0.0 | 0.0 | 0.0 | ... | 597.029 | 9.59 | 0.0 | 0.0 | 37.746 | 0.5 | 64.83 | 0.511 | NaN | 0.0 |
| 5 | AFG | Asia | Afghanistan | 2020-02-29 | 1.0 | 0.0 | 0.143 | 0.0 | 0.0 | 0.0 | ... | 597.029 | 9.59 | 0.0 | 0.0 | 37.746 | 0.5 | 64.83 | 0.511 | 0.0 | 0.0 |
| 6 | AFG | Asia | Afghanistan | 2020-03-01 | 1.0 | 0.0 | 0.143 | 0.0 | 0.0 | 0.0 | ... | 597.029 | 9.59 | 0.0 | 0.0 | 37.746 | 0.5 | 64.83 | 0.511 | 0.0 | 0.0 |
| 7 | AFG | Asia | Afghanistan | 2020-03-02 | 1.0 | 0.0 | 0.000 | 0.0 | 0.0 | 0.0 | ... | 597.029 | 9.59 | 0.0 | 0.0 | 37.746 | 0.5 | 64.83 | 0.511 | NaN | 0.0 |
| 8 | AFG | Asia | Afghanistan | 2020-03-03 | 2.0 | 1.0 | 0.143 | 0.0 | 0.0 | 0.0 | ... | 597.029 | 9.59 | 0.0 | 0.0 | 37.746 | 0.5 | 64.83 | 0.511 | 0.0 | 0.0 |
| 9 | AFG | Asia | Afghanistan | 2020-03-04 | 4.0 | 2.0 | 0.429 | 0.0 | 0.0 | 0.0 | ... | 597.029 | 9.59 | 0.0 | 0.0 | 37.746 | 0.5 | 64.83 | 0.511 | 0.0 | 0.0 |
10 rows × 61 columns
correlations = covid_df_copy.corr()['total_cases'].abs().sort_values(ascending=False).drop('total_cases',axis=0).to_frame()
correlations.plot(kind='bar',figsize=(12,10));
# Function to see the correlation of each features
def corr(df):
"argument df tp get the correlation for"
return df.corr()
corr(covid_df_copy).style.background_gradient(cmap="CMRmap_r")
| total_cases | new_cases | new_cases_smoothed | total_deaths | new_deaths | new_deaths_smoothed | total_cases_per_million | new_cases_per_million | new_cases_smoothed_per_million | total_deaths_per_million | new_deaths_per_million | new_deaths_smoothed_per_million | reproduction_rate | icu_patients | icu_patients_per_million | hosp_patients | hosp_patients_per_million | weekly_icu_admissions | weekly_icu_admissions_per_million | weekly_hosp_admissions | weekly_hosp_admissions_per_million | new_tests | total_tests | total_tests_per_thousand | new_tests_per_thousand | new_tests_smoothed | new_tests_smoothed_per_thousand | positive_rate | tests_per_case | total_vaccinations | people_vaccinated | people_fully_vaccinated | new_vaccinations | new_vaccinations_smoothed | total_vaccinations_per_hundred | people_vaccinated_per_hundred | people_fully_vaccinated_per_hundred | new_vaccinations_smoothed_per_million | stringency_index | population | population_density | median_age | aged_65_older | aged_70_older | gdp_per_capita | extreme_poverty | cardiovasc_death_rate | diabetes_prevalence | female_smokers | male_smokers | handwashing_facilities | hospital_beds_per_thousand | life_expectancy | human_development_index | death_rate | population_coverage | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| total_cases | 1.000000 | 0.874162 | 0.888154 | 0.983313 | 0.848058 | 0.876899 | 0.140310 | 0.067746 | 0.082189 | 0.169975 | 0.071839 | 0.095372 | -0.053677 | 0.167050 | 0.044366 | 0.155312 | 0.019911 | 0.003233 | -0.001386 | 0.060245 | 0.007456 | 0.176094 | 0.210690 | 0.008818 | -0.002867 | 0.184394 | -0.003935 | -0.031146 | -0.016797 | 0.709946 | 0.716938 | 0.637546 | 0.699287 | 0.766562 | 0.061655 | 0.069220 | 0.045063 | 0.048157 | -0.143389 | 0.603536 | -0.022085 | -0.076957 | -0.028515 | -0.028776 | -0.023861 | -0.030544 | -0.103823 | -0.060493 | -0.016033 | -0.028103 | 0.003269 | -0.042292 | -0.181257 | -0.101104 | -0.006046 | 0.008587 |
| new_cases | 0.874162 | 1.000000 | 0.988770 | 0.894524 | 0.926816 | 0.923409 | 0.113827 | 0.112570 | 0.120201 | 0.146790 | 0.095649 | 0.114138 | -0.051484 | 0.207915 | 0.064766 | 0.198959 | 0.036703 | 0.003244 | -0.002144 | 0.064484 | 0.008696 | 0.197580 | 0.172913 | 0.000945 | -0.001406 | 0.200244 | -0.003066 | -0.021606 | -0.019654 | 0.388810 | 0.397605 | 0.328983 | 0.422973 | 0.454275 | 0.025148 | 0.029821 | 0.015983 | 0.018978 | -0.158905 | 0.661523 | -0.025341 | -0.086924 | -0.028955 | -0.029078 | -0.025403 | -0.036756 | -0.119975 | -0.072198 | -0.014383 | -0.031940 | -0.000706 | -0.045197 | -0.207708 | -0.115344 | -0.007221 | 0.000681 |
| new_cases_smoothed | 0.888154 | 0.988770 | 1.000000 | 0.906475 | 0.918715 | 0.939357 | 0.117262 | 0.102323 | 0.122074 | 0.149869 | 0.092143 | 0.118265 | -0.053868 | 0.212243 | 0.065476 | 0.202595 | 0.036813 | 0.007140 | 0.000651 | 0.078322 | 0.013282 | 0.197794 | 0.178206 | 0.001922 | -0.001699 | 0.204380 | -0.002600 | -0.021468 | -0.019736 | 0.392009 | 0.401355 | 0.328814 | 0.425698 | 0.463595 | 0.026080 | 0.030955 | 0.016457 | 0.020343 | -0.159214 | 0.664703 | -0.025452 | -0.087217 | -0.029013 | -0.029132 | -0.025329 | -0.036887 | -0.120503 | -0.072394 | -0.014427 | -0.032033 | -0.000643 | -0.045448 | -0.208477 | -0.115709 | -0.007022 | 0.001657 |
| total_deaths | 0.983313 | 0.894524 | 0.906475 | 1.000000 | 0.881153 | 0.910655 | 0.132417 | 0.064787 | 0.078132 | 0.187885 | 0.077617 | 0.103195 | -0.059426 | 0.141831 | 0.037020 | 0.134571 | 0.017193 | 0.002716 | -0.003176 | 0.050593 | 0.005673 | 0.145829 | 0.163375 | -0.002184 | -0.008226 | 0.152508 | -0.010534 | -0.026620 | -0.018823 | 0.665703 | 0.677479 | 0.602474 | 0.653199 | 0.716121 | 0.053589 | 0.062107 | 0.038132 | 0.039971 | -0.160678 | 0.640373 | -0.025668 | -0.087175 | -0.030545 | -0.029817 | -0.028874 | -0.037296 | -0.123561 | -0.072404 | -0.017543 | -0.035624 | 0.004469 | -0.049499 | -0.205350 | -0.114825 | -0.001535 | -0.002447 |
| new_deaths | 0.848058 | 0.926816 | 0.918715 | 0.881153 | 1.000000 | 0.971499 | 0.106557 | 0.085865 | 0.096648 | 0.153542 | 0.140592 | 0.147183 | -0.058144 | 0.153560 | 0.054363 | 0.150057 | 0.033229 | 0.000592 | -0.003276 | 0.028982 | 0.002989 | 0.143001 | 0.127968 | -0.009556 | -0.008984 | 0.140609 | -0.013168 | -0.011568 | -0.021013 | 0.414428 | 0.429076 | 0.352333 | 0.456267 | 0.487689 | 0.027582 | 0.034202 | 0.017284 | 0.020900 | -0.166023 | 0.679298 | -0.027743 | -0.093805 | -0.032032 | -0.031190 | -0.031716 | -0.040879 | -0.131013 | -0.078984 | -0.016850 | -0.035987 | 0.003875 | -0.049041 | -0.222544 | -0.124281 | 0.008762 | -0.009840 |
| new_deaths_smoothed | 0.876899 | 0.923409 | 0.939357 | 0.910655 | 0.971499 | 1.000000 | 0.111334 | 0.079456 | 0.097269 | 0.160171 | 0.115682 | 0.151781 | -0.061976 | 0.156295 | 0.054274 | 0.152396 | 0.032897 | 0.006190 | -0.001283 | 0.056241 | 0.009894 | 0.139525 | 0.134119 | -0.008888 | -0.010318 | 0.145490 | -0.013247 | -0.012398 | -0.021468 | 0.434847 | 0.450076 | 0.370088 | 0.467670 | 0.511886 | 0.029955 | 0.036991 | 0.018918 | 0.023029 | -0.169794 | 0.695507 | -0.028383 | -0.095988 | -0.032732 | -0.031858 | -0.032321 | -0.041816 | -0.134150 | -0.080795 | -0.017246 | -0.036835 | 0.004048 | -0.050265 | -0.227720 | -0.127139 | 0.009072 | -0.009179 |
| total_cases_per_million | 0.140310 | 0.113827 | 0.117262 | 0.132417 | 0.106557 | 0.111334 | 1.000000 | 0.588534 | 0.706893 | 0.830203 | 0.437590 | 0.580378 | 0.113436 | 0.189610 | 0.379739 | 0.222291 | 0.379806 | 0.082057 | 0.112887 | 0.074514 | 0.094312 | 0.184743 | 0.195524 | 0.457347 | 0.306539 | 0.201747 | 0.387508 | 0.200742 | -0.042170 | 0.103211 | 0.123252 | 0.120889 | 0.094056 | 0.107851 | 0.250308 | 0.261211 | 0.199877 | 0.238136 | 0.047408 | -0.032864 | 0.027198 | 0.128905 | 0.191239 | 0.182706 | 0.263809 | -0.194549 | -0.162586 | 0.042081 | 0.290454 | 0.090835 | -0.119412 | 0.137358 | 0.125502 | 0.163102 | -0.026569 | 0.456945 |
| new_cases_per_million | 0.067746 | 0.112570 | 0.102323 | 0.064787 | 0.085865 | 0.079456 | 0.588534 | 1.000000 | 0.845264 | 0.479247 | 0.525415 | 0.573209 | 0.161570 | 0.162670 | 0.380001 | 0.204918 | 0.422492 | 0.050180 | 0.074757 | 0.053348 | 0.077163 | 0.137356 | 0.105801 | 0.279451 | 0.248827 | 0.145248 | 0.293487 | 0.274428 | -0.044086 | 0.022902 | 0.030287 | 0.026165 | 0.025943 | 0.028801 | 0.128378 | 0.135856 | 0.093364 | 0.139727 | 0.060453 | -0.029948 | 0.027421 | 0.130760 | 0.201507 | 0.191215 | 0.186934 | -0.156516 | -0.110503 | 0.022212 | 0.277451 | 0.088948 | -0.110486 | 0.145796 | 0.104595 | 0.141638 | -0.031092 | 0.278986 |
| new_cases_smoothed_per_million | 0.082189 | 0.120201 | 0.122074 | 0.078132 | 0.096648 | 0.097269 | 0.706893 | 0.845264 | 1.000000 | 0.573430 | 0.548876 | 0.696430 | 0.172021 | 0.192666 | 0.453811 | 0.241222 | 0.503844 | 0.100595 | 0.144452 | 0.083918 | 0.134118 | 0.160067 | 0.128573 | 0.335165 | 0.282043 | 0.174152 | 0.348233 | 0.324376 | -0.051688 | 0.028140 | 0.037136 | 0.031701 | 0.030964 | 0.036003 | 0.152240 | 0.158618 | 0.112353 | 0.166602 | 0.076521 | -0.034918 | 0.032288 | 0.152835 | 0.235662 | 0.223876 | 0.219669 | -0.183398 | -0.130390 | 0.026003 | 0.324493 | 0.104041 | -0.129319 | 0.169874 | 0.122631 | 0.166580 | -0.035884 | 0.334678 |
| total_deaths_per_million | 0.169975 | 0.146790 | 0.149869 | 0.187885 | 0.153542 | 0.160171 | 0.830203 | 0.479247 | 0.573430 | 1.000000 | 0.470466 | 0.627766 | 0.113382 | 0.209748 | 0.394161 | 0.268544 | 0.407010 | 0.073335 | 0.081277 | 0.081456 | 0.089775 | 0.206481 | 0.196909 | 0.284645 | 0.213855 | 0.227330 | 0.266208 | 0.246467 | -0.049904 | 0.109736 | 0.134283 | 0.124896 | 0.101472 | 0.115260 | 0.164580 | 0.193320 | 0.123952 | 0.155402 | 0.067364 | -0.017309 | -0.024086 | 0.137055 | 0.261580 | 0.264115 | 0.200522 | -0.190581 | -0.210793 | -0.044598 | 0.319525 | 0.030308 | -0.101904 | 0.143193 | 0.087003 | 0.131226 | 0.011712 | 0.284074 |
| new_deaths_per_million | 0.071839 | 0.095649 | 0.092143 | 0.077617 | 0.140592 | 0.115682 | 0.437590 | 0.525415 | 0.548876 | 0.470466 | 1.000000 | 0.761659 | 0.095104 | 0.147454 | 0.384704 | 0.201959 | 0.455957 | 0.042898 | 0.068872 | 0.034135 | 0.075834 | 0.113285 | 0.086355 | 0.169078 | 0.171336 | 0.116431 | 0.197733 | 0.278953 | -0.038170 | 0.029405 | 0.039161 | 0.031898 | 0.033830 | 0.036462 | 0.059193 | 0.072336 | 0.041953 | 0.070315 | 0.087141 | -0.016930 | -0.004708 | 0.125312 | 0.204168 | 0.203046 | 0.111495 | -0.125432 | -0.094610 | -0.017334 | 0.239894 | 0.051983 | -0.071446 | 0.133625 | 0.060635 | 0.107483 | 0.041898 | 0.168618 |
| new_deaths_smoothed_per_million | 0.095372 | 0.114138 | 0.118265 | 0.103195 | 0.147183 | 0.151781 | 0.580378 | 0.573209 | 0.696430 | 0.627766 | 0.761659 | 1.000000 | 0.111029 | 0.188483 | 0.493884 | 0.258051 | 0.585551 | 0.088153 | 0.116361 | 0.077859 | 0.128290 | 0.139507 | 0.116111 | 0.224733 | 0.215499 | 0.153056 | 0.257801 | 0.352966 | -0.049257 | 0.040672 | 0.053856 | 0.044109 | 0.044128 | 0.050147 | 0.080723 | 0.096940 | 0.058043 | 0.092980 | 0.116187 | -0.021857 | -0.005661 | 0.162006 | 0.264513 | 0.263256 | 0.145045 | -0.162764 | -0.124260 | -0.022725 | 0.310304 | 0.066786 | -0.092489 | 0.172452 | 0.078847 | 0.139706 | 0.050747 | 0.224186 |
| reproduction_rate | -0.053677 | -0.051484 | -0.053868 | -0.059426 | -0.058144 | -0.061976 | 0.113436 | 0.161570 | 0.172021 | 0.113382 | 0.095104 | 0.111029 | 1.000000 | 0.038855 | 0.091811 | 0.049207 | 0.098755 | 0.025354 | 0.022764 | 0.017967 | 0.026981 | 0.074027 | 0.050926 | 0.127212 | 0.113066 | 0.080627 | 0.134845 | 0.279054 | -0.001919 | -0.032688 | -0.032454 | -0.029463 | -0.032368 | -0.034495 | -0.024869 | -0.026788 | -0.025987 | -0.056890 | 0.418368 | -0.042066 | -0.022857 | 0.376137 | 0.294493 | 0.281492 | 0.180238 | -0.011607 | 0.127363 | 0.052727 | 0.237481 | 0.215312 | 0.071026 | 0.172320 | 0.333756 | 0.357612 | -0.039500 | 0.125472 |
| icu_patients | 0.167050 | 0.207915 | 0.212243 | 0.141831 | 0.153560 | 0.156295 | 0.189610 | 0.162670 | 0.192666 | 0.209748 | 0.147454 | 0.188483 | 0.038855 | 1.000000 | 0.458657 | 0.935068 | 0.274910 | 0.057584 | 0.024722 | 0.344632 | 0.083035 | 0.693271 | 0.678932 | 0.124732 | 0.082291 | 0.723640 | 0.104286 | 0.047774 | -0.010850 | 0.069731 | 0.087829 | 0.077317 | 0.074826 | 0.096462 | 0.048609 | 0.060465 | 0.033312 | 0.049908 | 0.072597 | 0.008329 | -0.012730 | 0.108314 | 0.152195 | 0.154094 | 0.146779 | -0.042952 | -0.068656 | 0.025804 | 0.143981 | 0.025479 | -0.070677 | 0.053769 | 0.062059 | 0.101677 | 0.002655 | 0.124600 |
| icu_patients_per_million | 0.044366 | 0.064766 | 0.065476 | 0.037020 | 0.054363 | 0.054274 | 0.379739 | 0.380001 | 0.453811 | 0.394161 | 0.384704 | 0.493884 | 0.091811 | 0.458657 | 1.000000 | 0.482915 | 0.707177 | 0.174015 | 0.195108 | 0.164129 | 0.207849 | 0.262502 | 0.221972 | 0.255339 | 0.227051 | 0.290548 | 0.276112 | 0.145644 | -0.022177 | 0.018971 | 0.027037 | 0.022048 | 0.020296 | 0.026192 | 0.098063 | 0.111768 | 0.088027 | 0.083225 | 0.144835 | -0.027614 | -0.020657 | 0.262480 | 0.363494 | 0.372616 | 0.263075 | -0.097072 | -0.116509 | -0.037622 | 0.346683 | 0.091937 | -0.155586 | 0.191935 | 0.143717 | 0.213604 | 0.016448 | 0.255071 |
| hosp_patients | 0.155312 | 0.198959 | 0.202595 | 0.134571 | 0.150057 | 0.152396 | 0.222291 | 0.204918 | 0.241222 | 0.268544 | 0.201959 | 0.258051 | 0.049207 | 0.935068 | 0.482915 | 1.000000 | 0.441252 | 0.079062 | 0.033106 | 0.329161 | 0.085865 | 0.672587 | 0.637165 | 0.145622 | 0.113784 | 0.691770 | 0.135677 | 0.084717 | -0.012767 | 0.063867 | 0.081964 | 0.068257 | 0.069302 | 0.090230 | 0.056429 | 0.072026 | 0.035746 | 0.059107 | 0.089009 | 0.002732 | -0.014845 | 0.137496 | 0.194530 | 0.195155 | 0.154903 | -0.053775 | -0.083234 | 0.000321 | 0.180451 | 0.036959 | -0.086607 | 0.067886 | 0.077994 | 0.120263 | 0.008314 | 0.145460 |
| hosp_patients_per_million | 0.019911 | 0.036703 | 0.036813 | 0.017193 | 0.033229 | 0.032897 | 0.379806 | 0.422492 | 0.503844 | 0.407010 | 0.455957 | 0.585551 | 0.098755 | 0.274910 | 0.707177 | 0.441252 | 1.000000 | 0.131836 | 0.166779 | 0.096579 | 0.152166 | 0.197234 | 0.141853 | 0.286767 | 0.311630 | 0.205804 | 0.358578 | 0.213709 | -0.022640 | 0.008063 | 0.013947 | 0.009229 | 0.009232 | 0.012564 | 0.084606 | 0.100773 | 0.068859 | 0.080413 | 0.130002 | -0.032953 | -0.025507 | 0.274021 | 0.376931 | 0.381943 | 0.218247 | -0.103530 | -0.067565 | -0.065286 | 0.374788 | 0.121144 | -0.161093 | 0.222383 | 0.137395 | 0.208631 | 0.014673 | 0.286501 |
| weekly_icu_admissions | 0.003233 | 0.003244 | 0.007140 | 0.002716 | 0.000592 | 0.006190 | 0.082057 | 0.050180 | 0.100595 | 0.073335 | 0.042898 | 0.088153 | 0.025354 | 0.057584 | 0.174015 | 0.079062 | 0.131836 | 1.000000 | 0.725509 | 0.220380 | 0.452138 | 0.001681 | -0.002840 | 0.008788 | 0.010118 | 0.034354 | 0.045339 | 0.024463 | -0.004593 | 0.001039 | 0.001522 | 0.002628 | -0.000898 | 0.001147 | 0.034184 | 0.036544 | 0.034657 | 0.027089 | 0.027721 | -0.005597 | -0.003803 | 0.046404 | 0.067425 | 0.067026 | 0.039510 | -0.019339 | -0.031350 | -0.012927 | 0.080540 | 0.026038 | -0.029040 | 0.039539 | 0.029031 | 0.039938 | 0.005702 | 0.008722 |
| weekly_icu_admissions_per_million | -0.001386 | -0.002144 | 0.000651 | -0.003176 | -0.003276 | -0.001283 | 0.112887 | 0.074757 | 0.144452 | 0.081277 | 0.068872 | 0.116361 | 0.022764 | 0.024722 | 0.195108 | 0.033106 | 0.166779 | 0.725509 | 1.000000 | 0.140506 | 0.585200 | 0.001016 | -0.001073 | 0.045667 | 0.031036 | 0.014018 | 0.071336 | 0.037657 | -0.004930 | 0.000457 | 0.000732 | 0.002406 | -0.000671 | -0.000036 | 0.056805 | 0.060279 | 0.058159 | 0.045318 | 0.027692 | -0.007795 | -0.003417 | 0.049389 | 0.070503 | 0.069327 | 0.039891 | -0.021763 | -0.018713 | -0.011976 | 0.078978 | 0.032478 | -0.032793 | 0.042805 | 0.029440 | 0.044893 | -0.001001 | 0.045602 |
| weekly_hosp_admissions | 0.060245 | 0.064484 | 0.078322 | 0.050593 | 0.028982 | 0.056241 | 0.074514 | 0.053348 | 0.083918 | 0.081456 | 0.034135 | 0.077859 | 0.017967 | 0.344632 | 0.164129 | 0.329161 | 0.096579 | 0.220380 | 0.140506 | 1.000000 | 0.470875 | 0.163707 | 0.249412 | 0.048597 | 0.018438 | 0.260891 | 0.041091 | 0.029051 | -0.004227 | 0.024469 | 0.030287 | 0.029554 | 0.026390 | 0.035353 | 0.018043 | 0.022374 | 0.012907 | 0.019192 | 0.026894 | 0.002160 | -0.004970 | 0.042488 | 0.058814 | 0.059004 | 0.050983 | -0.015782 | -0.020448 | 0.009414 | 0.057261 | 0.012390 | -0.027558 | 0.021923 | 0.023308 | 0.038155 | 0.001086 | 0.048545 |
| weekly_hosp_admissions_per_million | 0.007456 | 0.008696 | 0.013282 | 0.005673 | 0.002989 | 0.009894 | 0.094312 | 0.077163 | 0.134118 | 0.089775 | 0.075834 | 0.128290 | 0.026981 | 0.083035 | 0.207849 | 0.085865 | 0.152166 | 0.452138 | 0.585200 | 0.470875 | 1.000000 | 0.030735 | 0.042801 | 0.046025 | 0.027161 | 0.056558 | 0.060606 | 0.059068 | -0.005786 | 0.002859 | 0.004342 | 0.003761 | 0.002078 | 0.004336 | 0.020414 | 0.023955 | 0.017885 | 0.018123 | 0.029022 | -0.007501 | -0.005164 | 0.065574 | 0.089623 | 0.090200 | 0.043464 | -0.021773 | -0.007358 | -0.005046 | 0.093810 | 0.034457 | -0.038569 | 0.057214 | 0.031422 | 0.049089 | 0.000277 | 0.045947 |
| new_tests | 0.176094 | 0.197580 | 0.197794 | 0.145829 | 0.143001 | 0.139525 | 0.184743 | 0.137356 | 0.160067 | 0.206481 | 0.113285 | 0.139507 | 0.074027 | 0.693271 | 0.262502 | 0.672587 | 0.197234 | 0.001681 | 0.001016 | 0.163707 | 0.030735 | 1.000000 | 0.850537 | 0.231315 | 0.259000 | 0.959910 | 0.224101 | 0.051630 | -0.006248 | 0.080641 | 0.103696 | 0.083187 | 0.081093 | 0.097095 | 0.096496 | 0.103842 | 0.046838 | 0.071248 | 0.097247 | 0.055954 | -0.015365 | 0.120822 | 0.136676 | 0.131513 | 0.143164 | -0.035680 | -0.032787 | 0.058241 | 0.129709 | 0.059662 | -0.042721 | 0.046531 | 0.071982 | 0.118512 | -0.009751 | 0.231107 |
| total_tests | 0.210690 | 0.172913 | 0.178206 | 0.163375 | 0.127968 | 0.134119 | 0.195524 | 0.105801 | 0.128573 | 0.196909 | 0.086355 | 0.116111 | 0.050926 | 0.678932 | 0.221972 | 0.637165 | 0.141853 | -0.002840 | -0.001073 | 0.249412 | 0.042801 | 0.850537 | 1.000000 | 0.233888 | 0.129574 | 0.877601 | 0.149869 | 0.038318 | -0.000818 | 0.141127 | 0.173132 | 0.165012 | 0.138028 | 0.164505 | 0.118158 | 0.122362 | 0.073246 | 0.092846 | 0.072322 | 0.052154 | -0.011999 | 0.095470 | 0.101786 | 0.097218 | 0.116549 | -0.026701 | -0.016536 | 0.059215 | 0.094105 | 0.049853 | -0.031025 | 0.032103 | 0.055947 | 0.095064 | -0.008914 | 0.233722 |
| total_tests_per_thousand | 0.008818 | 0.000945 | 0.001922 | -0.002184 | -0.009556 | -0.008888 | 0.457347 | 0.279451 | 0.335165 | 0.284645 | 0.169078 | 0.224733 | 0.127212 | 0.124732 | 0.255339 | 0.145622 | 0.286767 | 0.008788 | 0.045667 | 0.048597 | 0.046025 | 0.231315 | 0.233888 | 1.000000 | 0.700227 | 0.234009 | 0.808760 | 0.045558 | 0.049268 | 0.025935 | 0.029039 | 0.026121 | 0.018542 | 0.023061 | 0.296303 | 0.226624 | 0.171590 | 0.231448 | 0.082132 | -0.044473 | 0.002842 | 0.242261 | 0.234581 | 0.226684 | 0.384560 | -0.128383 | -0.094787 | 0.077516 | 0.243980 | 0.134093 | -0.127758 | 0.121365 | 0.166853 | 0.249154 | -0.027813 | 1.000000 |
| new_tests_per_thousand | -0.002867 | -0.001406 | -0.001699 | -0.008226 | -0.008984 | -0.010318 | 0.306539 | 0.248827 | 0.282043 | 0.213855 | 0.171336 | 0.215499 | 0.113066 | 0.082291 | 0.227051 | 0.113784 | 0.311630 | 0.010118 | 0.031036 | 0.018438 | 0.027161 | 0.259000 | 0.129574 | 0.700227 | 1.000000 | 0.195554 | 0.841009 | 0.034472 | 0.022400 | 0.007601 | 0.009698 | 0.004542 | 0.004217 | 0.004911 | 0.173409 | 0.136969 | 0.095031 | 0.140798 | 0.082602 | -0.035225 | -0.009117 | 0.196737 | 0.203759 | 0.197701 | 0.272082 | -0.100472 | -0.070550 | 0.042892 | 0.215084 | 0.111272 | -0.103854 | 0.114516 | 0.128829 | 0.192909 | -0.019392 | 0.700120 |
| new_tests_smoothed | 0.184394 | 0.200244 | 0.204380 | 0.152508 | 0.140609 | 0.145490 | 0.201747 | 0.145248 | 0.174152 | 0.227330 | 0.116431 | 0.153056 | 0.080627 | 0.723640 | 0.290548 | 0.691770 | 0.205804 | 0.034354 | 0.014018 | 0.260891 | 0.056558 | 0.959910 | 0.877601 | 0.234009 | 0.195554 | 1.000000 | 0.236990 | 0.053568 | -0.000302 | 0.085584 | 0.109490 | 0.090903 | 0.085717 | 0.103846 | 0.100421 | 0.108343 | 0.050790 | 0.075685 | 0.106010 | 0.055768 | -0.010963 | 0.143650 | 0.163709 | 0.161407 | 0.163299 | -0.043521 | -0.040159 | 0.060796 | 0.154964 | 0.069635 | -0.053442 | 0.066067 | 0.083632 | 0.136084 | -0.010460 | 0.233772 |
| new_tests_smoothed_per_thousand | -0.003935 | -0.003066 | -0.002600 | -0.010534 | -0.013168 | -0.013247 | 0.387508 | 0.293487 | 0.348233 | 0.266208 | 0.197733 | 0.257801 | 0.134845 | 0.104286 | 0.276112 | 0.135677 | 0.358578 | 0.045339 | 0.071336 | 0.041091 | 0.060606 | 0.224101 | 0.149869 | 0.808760 | 0.841009 | 0.236990 | 1.000000 | 0.044084 | 0.054507 | 0.008228 | 0.010767 | 0.005416 | 0.004095 | 0.005433 | 0.203618 | 0.163401 | 0.114343 | 0.167616 | 0.095737 | -0.044598 | 0.027929 | 0.248177 | 0.257854 | 0.250638 | 0.349434 | -0.127458 | -0.096131 | 0.058108 | 0.267484 | 0.140483 | -0.132594 | 0.144199 | 0.166006 | 0.246570 | -0.025004 | 0.808666 |
| positive_rate | -0.031146 | -0.021606 | -0.021468 | -0.026620 | -0.011568 | -0.012398 | 0.200742 | 0.274428 | 0.324376 | 0.246467 | 0.278953 | 0.352966 | 0.279054 | 0.047774 | 0.145644 | 0.084717 | 0.213709 | 0.024463 | 0.037657 | 0.029051 | 0.059068 | 0.051630 | 0.038318 | 0.045558 | 0.034472 | 0.053568 | 0.044084 | 1.000000 | -0.059293 | -0.023491 | -0.022768 | -0.019281 | -0.022766 | -0.025715 | -0.008491 | -0.005893 | -0.002592 | -0.023333 | 0.238004 | -0.062091 | -0.049598 | 0.172592 | 0.142323 | 0.121320 | 0.038846 | -0.010356 | -0.008876 | 0.049870 | 0.107077 | 0.075188 | 0.127191 | 0.036745 | 0.147926 | 0.187903 | -0.004562 | 0.044721 |
| tests_per_case | -0.016797 | -0.019654 | -0.019736 | -0.018823 | -0.021013 | -0.021468 | -0.042170 | -0.044086 | -0.051688 | -0.049904 | -0.038170 | -0.049257 | -0.001919 | -0.010850 | -0.022177 | -0.012767 | -0.022640 | -0.004593 | -0.004930 | -0.004227 | -0.005786 | -0.006248 | -0.000818 | 0.049268 | 0.022400 | -0.000302 | 0.054507 | -0.059293 | 1.000000 | -0.007437 | -0.007608 | -0.006871 | -0.007525 | -0.008812 | -0.009393 | -0.010211 | -0.007267 | -0.011795 | 0.016976 | -0.019104 | 0.038058 | 0.086532 | 0.071289 | 0.071439 | 0.093266 | -0.045423 | -0.053766 | 0.006233 | 0.025387 | 0.016185 | 0.005201 | 0.024104 | 0.065463 | 0.074840 | 0.004192 | 0.049068 |
| total_vaccinations | 0.709946 | 0.388810 | 0.392009 | 0.665703 | 0.414428 | 0.434847 | 0.103211 | 0.022902 | 0.028140 | 0.109736 | 0.029405 | 0.040672 | -0.032688 | 0.069731 | 0.018971 | 0.063867 | 0.008063 | 0.001039 | 0.000457 | 0.024469 | 0.002859 | 0.080641 | 0.141127 | 0.025935 | 0.007601 | 0.085584 | 0.008228 | -0.023491 | -0.007437 | 1.000000 | 0.985054 | 0.957749 | 0.875001 | 0.960024 | 0.118665 | 0.122346 | 0.090418 | 0.082558 | -0.073245 | 0.311227 | -0.010071 | -0.036586 | -0.012049 | -0.012172 | -0.006237 | -0.015202 | -0.051262 | -0.028161 | -0.006693 | -0.011914 | -0.001401 | -0.020649 | -0.085711 | -0.047074 | -0.002702 | 0.025837 |
| people_vaccinated | 0.716938 | 0.397605 | 0.401355 | 0.677479 | 0.429076 | 0.450076 | 0.123252 | 0.030287 | 0.037136 | 0.134283 | 0.039161 | 0.053856 | -0.032454 | 0.087829 | 0.027037 | 0.081964 | 0.013947 | 0.001522 | 0.000732 | 0.030287 | 0.004342 | 0.103696 | 0.173132 | 0.029039 | 0.009698 | 0.109490 | 0.010767 | -0.022768 | -0.007608 | 0.985054 | 1.000000 | 0.972234 | 0.858912 | 0.943917 | 0.128722 | 0.145309 | 0.099634 | 0.092726 | -0.071000 | 0.287409 | -0.010151 | -0.035110 | -0.008371 | -0.008399 | -0.004519 | -0.015686 | -0.053841 | -0.029690 | -0.002798 | -0.012807 | -0.002428 | -0.020879 | -0.085441 | -0.045670 | -0.002199 | 0.028937 |
| people_fully_vaccinated | 0.637546 | 0.328983 | 0.328814 | 0.602474 | 0.352333 | 0.370088 | 0.120889 | 0.026165 | 0.031701 | 0.124896 | 0.031898 | 0.044109 | -0.029463 | 0.077317 | 0.022048 | 0.068257 | 0.009229 | 0.002628 | 0.002406 | 0.029554 | 0.003761 | 0.083187 | 0.165012 | 0.026121 | 0.004542 | 0.090903 | 0.005416 | -0.019281 | -0.006871 | 0.957749 | 0.972234 | 1.000000 | 0.807367 | 0.887331 | 0.139038 | 0.145626 | 0.130317 | 0.093888 | -0.064170 | 0.242055 | -0.009298 | -0.031113 | -0.006861 | -0.007080 | -0.001097 | -0.014860 | -0.048160 | -0.024715 | -0.000799 | -0.011158 | -0.004413 | -0.016593 | -0.075702 | -0.039920 | -0.001935 | 0.026033 |
| new_vaccinations | 0.699287 | 0.422973 | 0.425698 | 0.653199 | 0.456267 | 0.467670 | 0.094056 | 0.025943 | 0.030964 | 0.101472 | 0.033830 | 0.044128 | -0.032368 | 0.074826 | 0.020296 | 0.069302 | 0.009232 | -0.000898 | -0.000671 | 0.026390 | 0.002078 | 0.081093 | 0.138028 | 0.018542 | 0.004217 | 0.085717 | 0.004095 | -0.022766 | -0.007525 | 0.875001 | 0.858912 | 0.807367 | 1.000000 | 0.899155 | 0.089690 | 0.095552 | 0.065368 | 0.072363 | -0.074309 | 0.318310 | -0.010010 | -0.038497 | -0.014155 | -0.014160 | -0.009345 | -0.013891 | -0.051031 | -0.028608 | -0.008240 | -0.012932 | 0.001317 | -0.022176 | -0.087618 | -0.049239 | -0.002707 | 0.018442 |
| new_vaccinations_smoothed | 0.766562 | 0.454275 | 0.463595 | 0.716121 | 0.487689 | 0.511886 | 0.107851 | 0.028801 | 0.036003 | 0.115260 | 0.036462 | 0.050147 | -0.034495 | 0.096462 | 0.026192 | 0.090230 | 0.012564 | 0.001147 | -0.000036 | 0.035353 | 0.004336 | 0.097095 | 0.164505 | 0.023061 | 0.004911 | 0.103846 | 0.005433 | -0.025715 | -0.008812 | 0.960024 | 0.943917 | 0.887331 | 0.899155 | 1.000000 | 0.100958 | 0.107417 | 0.073796 | 0.084578 | -0.077143 | 0.352143 | -0.011321 | -0.036704 | -0.011687 | -0.012751 | -0.008188 | -0.018000 | -0.055207 | -0.028498 | -0.008931 | -0.007200 | -0.002696 | -0.020466 | -0.092898 | -0.050473 | -0.003393 | 0.022941 |
| total_vaccinations_per_hundred | 0.061655 | 0.025148 | 0.026080 | 0.053589 | 0.027582 | 0.029955 | 0.250308 | 0.128378 | 0.152240 | 0.164580 | 0.059193 | 0.080723 | -0.024869 | 0.048609 | 0.098063 | 0.056429 | 0.084606 | 0.034184 | 0.056805 | 0.018043 | 0.020414 | 0.096496 | 0.118158 | 0.296303 | 0.173409 | 0.100421 | 0.203618 | -0.008491 | -0.009393 | 0.118665 | 0.128722 | 0.139038 | 0.089690 | 0.100958 | 1.000000 | 0.918229 | 0.892749 | 0.633313 | -0.027286 | -0.005554 | 0.033330 | -0.000745 | 0.028840 | 0.024467 | 0.068694 | -0.051714 | -0.080507 | -0.010601 | 0.049559 | 0.009010 | -0.061014 | 0.002118 | 0.038295 | -0.006267 | -0.011414 | 0.296202 |
| people_vaccinated_per_hundred | 0.069220 | 0.029821 | 0.030955 | 0.062107 | 0.034202 | 0.036991 | 0.261211 | 0.135856 | 0.158618 | 0.193320 | 0.072336 | 0.096940 | -0.026788 | 0.060465 | 0.111768 | 0.072026 | 0.100773 | 0.036544 | 0.060279 | 0.022374 | 0.023955 | 0.103842 | 0.122362 | 0.226624 | 0.136969 | 0.108343 | 0.163401 | -0.005893 | -0.010211 | 0.122346 | 0.145309 | 0.145626 | 0.095552 | 0.107417 | 0.918229 | 1.000000 | 0.882632 | 0.634031 | -0.023058 | -0.006301 | 0.037979 | -0.001204 | 0.046130 | 0.041364 | 0.054583 | -0.052676 | -0.095189 | -0.032465 | 0.065581 | 0.000998 | -0.066005 | 0.010227 | 0.036722 | -0.013074 | -0.010258 | 0.226494 |
| people_fully_vaccinated_per_hundred | 0.045063 | 0.015983 | 0.016457 | 0.038132 | 0.017284 | 0.018918 | 0.199877 | 0.093364 | 0.112353 | 0.123952 | 0.041953 | 0.058043 | -0.025987 | 0.033312 | 0.088027 | 0.035746 | 0.068859 | 0.034657 | 0.058159 | 0.012907 | 0.017885 | 0.046838 | 0.073246 | 0.171590 | 0.095031 | 0.050790 | 0.114343 | -0.002592 | -0.007267 | 0.090418 | 0.099634 | 0.130317 | 0.065368 | 0.073796 | 0.892749 | 0.882632 | 1.000000 | 0.508073 | -0.026437 | -0.005842 | 0.032462 | -0.009867 | 0.028714 | 0.023995 | 0.035012 | -0.037047 | -0.071317 | -0.029801 | 0.046449 | 0.000693 | -0.049580 | 0.006791 | 0.031069 | -0.014912 | -0.008141 | 0.171501 |
| new_vaccinations_smoothed_per_million | 0.048157 | 0.018978 | 0.020343 | 0.039971 | 0.020900 | 0.023029 | 0.238136 | 0.139727 | 0.166602 | 0.155402 | 0.070315 | 0.092980 | -0.056890 | 0.049908 | 0.083225 | 0.059107 | 0.080413 | 0.027089 | 0.045318 | 0.019192 | 0.018123 | 0.071248 | 0.092846 | 0.231448 | 0.140798 | 0.075685 | 0.167616 | -0.023333 | -0.011795 | 0.082558 | 0.092726 | 0.093888 | 0.072363 | 0.084578 | 0.633313 | 0.634031 | 0.508073 | 1.000000 | -0.067837 | -0.014540 | 0.082324 | -0.068941 | -0.009645 | -0.013092 | 0.064651 | -0.073926 | -0.136471 | -0.024017 | 0.024590 | -0.033433 | -0.080367 | -0.003140 | 0.045405 | -0.085219 | -0.014338 | 0.231352 |
| stringency_index | -0.143389 | -0.158905 | -0.159214 | -0.160678 | -0.166023 | -0.169794 | 0.047408 | 0.060453 | 0.076521 | 0.067364 | 0.087141 | 0.116187 | 0.418368 | 0.072597 | 0.144835 | 0.089009 | 0.130002 | 0.027721 | 0.027692 | 0.026894 | 0.029022 | 0.097247 | 0.072322 | 0.082132 | 0.082602 | 0.106010 | 0.095737 | 0.238004 | 0.016976 | -0.073245 | -0.071000 | -0.064170 | -0.074309 | -0.077143 | -0.027286 | -0.023058 | -0.026437 | -0.067837 | 1.000000 | -0.194006 | 0.017389 | 0.289722 | 0.161853 | 0.140006 | 0.137627 | -0.042154 | 0.159565 | 0.136867 | 0.124935 | 0.208290 | 0.138666 | 0.088965 | 0.329506 | 0.316832 | 0.026320 | 0.079944 |
| population | 0.603536 | 0.661523 | 0.664703 | 0.640373 | 0.679298 | 0.695507 | -0.032864 | -0.029948 | -0.034918 | -0.017309 | -0.016930 | -0.021857 | -0.042066 | 0.008329 | -0.027614 | 0.002732 | -0.032953 | -0.005597 | -0.007795 | 0.002160 | -0.007501 | 0.055954 | 0.052154 | -0.044473 | -0.035225 | 0.055768 | -0.044598 | -0.062091 | -0.019104 | 0.311227 | 0.287409 | 0.242055 | 0.318310 | 0.352143 | -0.005554 | -0.006301 | -0.005842 | -0.014540 | -0.194006 | 1.000000 | -0.026434 | -0.110267 | -0.063992 | -0.066128 | -0.070108 | -0.021541 | -0.106414 | -0.075626 | -0.067073 | -0.016264 | 0.032874 | -0.061791 | -0.235857 | -0.147653 | 0.004737 | -0.044799 |
| population_density | -0.022085 | -0.025341 | -0.025452 | -0.025668 | -0.027743 | -0.028383 | 0.027198 | 0.027421 | 0.032288 | -0.024086 | -0.004708 | -0.005661 | -0.022857 | -0.012730 | -0.020657 | -0.014845 | -0.025507 | -0.003803 | -0.003417 | -0.004970 | -0.005164 | -0.015365 | -0.011999 | 0.002842 | -0.009117 | -0.010963 | 0.027929 | -0.049598 | 0.038058 | -0.010071 | -0.010151 | -0.009298 | -0.010010 | -0.011321 | 0.033330 | 0.037979 | 0.032462 | 0.082324 | 0.017389 | -0.026434 | 1.000000 | -0.078929 | -0.038343 | -0.047520 | 0.098251 | -0.055936 | -0.179121 | 0.026118 | -0.062127 | -0.063928 | -0.070068 | 0.268463 | 0.123856 | -0.128936 | 0.000378 | 0.002514 |
| median_age | -0.076957 | -0.086924 | -0.087217 | -0.087175 | -0.093805 | -0.095988 | 0.128905 | 0.130760 | 0.152835 | 0.137055 | 0.125312 | 0.162006 | 0.376137 | 0.108314 | 0.262480 | 0.137496 | 0.274021 | 0.046404 | 0.049389 | 0.042488 | 0.065574 | 0.120822 | 0.095470 | 0.242261 | 0.196737 | 0.143650 | 0.248177 | 0.172592 | 0.086532 | -0.036586 | -0.035110 | -0.031113 | -0.038497 | -0.036704 | -0.000745 | -0.001204 | -0.009867 | -0.068941 | 0.289722 | -0.110267 | -0.078929 | 1.000000 | 0.853135 | 0.837727 | 0.541387 | -0.284861 | 0.151234 | 0.286323 | 0.596544 | 0.509389 | 0.010280 | 0.533658 | 0.660811 | 0.804194 | 0.002433 | 0.241955 |
| aged_65_older | -0.028515 | -0.028955 | -0.029013 | -0.030545 | -0.032032 | -0.032732 | 0.191239 | 0.201507 | 0.235662 | 0.261580 | 0.204168 | 0.264513 | 0.294493 | 0.152195 | 0.363494 | 0.194530 | 0.376931 | 0.067425 | 0.070503 | 0.058814 | 0.089623 | 0.136676 | 0.101786 | 0.234581 | 0.203759 | 0.163709 | 0.257854 | 0.142323 | 0.071289 | -0.012049 | -0.008371 | -0.006861 | -0.014155 | -0.011687 | 0.028840 | 0.046130 | 0.028714 | -0.009645 | 0.161853 | -0.063992 | -0.038343 | 0.853135 | 1.000000 | 0.965808 | 0.510197 | -0.314490 | -0.061668 | 0.077630 | 0.770230 | 0.417560 | -0.163717 | 0.627768 | 0.495638 | 0.690167 | 0.019838 | 0.233574 |
| aged_70_older | -0.028776 | -0.029078 | -0.029132 | -0.029817 | -0.031190 | -0.031858 | 0.182706 | 0.191215 | 0.223876 | 0.264115 | 0.203046 | 0.263256 | 0.281492 | 0.154094 | 0.372616 | 0.195155 | 0.381943 | 0.067026 | 0.069327 | 0.059004 | 0.090200 | 0.131513 | 0.097218 | 0.226684 | 0.197701 | 0.161407 | 0.250638 | 0.121320 | 0.071439 | -0.012172 | -0.008399 | -0.007080 | -0.014160 | -0.012751 | 0.024467 | 0.041364 | 0.023995 | -0.013092 | 0.140006 | -0.066128 | -0.047520 | 0.837727 | 0.965808 | 1.000000 | 0.492902 | -0.315749 | -0.098685 | 0.018650 | 0.729027 | 0.380441 | -0.204727 | 0.608093 | 0.485926 | 0.638597 | 0.022357 | 0.225655 |
| gdp_per_capita | -0.023861 | -0.025403 | -0.025329 | -0.028874 | -0.031716 | -0.032321 | 0.263809 | 0.186934 | 0.219669 | 0.200522 | 0.111495 | 0.145045 | 0.180238 | 0.146779 | 0.263075 | 0.154903 | 0.218247 | 0.039510 | 0.039891 | 0.050983 | 0.043464 | 0.143164 | 0.116549 | 0.384560 | 0.272082 | 0.163299 | 0.349434 | 0.038846 | 0.093266 | -0.006237 | -0.004519 | -0.001097 | -0.009345 | -0.008188 | 0.068694 | 0.054583 | 0.035012 | 0.064651 | 0.137627 | -0.070108 | 0.098251 | 0.541387 | 0.510197 | 0.492902 | 1.000000 | -0.346698 | -0.284708 | 0.282379 | 0.405520 | 0.192875 | -0.273995 | 0.342248 | 0.425693 | 0.548965 | -0.018833 | 0.384017 |
| extreme_poverty | -0.030544 | -0.036756 | -0.036887 | -0.037296 | -0.040879 | -0.041816 | -0.194549 | -0.156516 | -0.183398 | -0.190581 | -0.125432 | -0.162764 | -0.011607 | -0.042952 | -0.097072 | -0.053775 | -0.103530 | -0.019339 | -0.021763 | -0.015782 | -0.021773 | -0.035680 | -0.026701 | -0.128383 | -0.100472 | -0.043521 | -0.127458 | -0.010356 | -0.045423 | -0.015202 | -0.015686 | -0.014860 | -0.013891 | -0.018000 | -0.051714 | -0.052676 | -0.037047 | -0.073926 | -0.042154 | -0.021541 | -0.055936 | -0.284861 | -0.314490 | -0.315749 | -0.346698 | 1.000000 | 0.191294 | -0.273647 | -0.241286 | -0.090650 | -0.023015 | -0.317773 | -0.145519 | -0.253455 | -0.004740 | -0.129305 |
| cardiovasc_death_rate | -0.103823 | -0.119975 | -0.120503 | -0.123561 | -0.131013 | -0.134150 | -0.162586 | -0.110503 | -0.130390 | -0.210793 | -0.094610 | -0.124260 | 0.127363 | -0.068656 | -0.116509 | -0.083234 | -0.067565 | -0.031350 | -0.018713 | -0.020448 | -0.007358 | -0.032787 | -0.016536 | -0.094787 | -0.070550 | -0.040159 | -0.096131 | -0.008876 | -0.053766 | -0.051262 | -0.053841 | -0.048160 | -0.051031 | -0.055207 | -0.080507 | -0.095189 | -0.071317 | -0.136471 | 0.159565 | -0.106414 | -0.179121 | 0.151234 | -0.061668 | -0.098685 | -0.284708 | 0.191294 | 1.000000 | 0.244890 | -0.041821 | 0.334512 | 0.279880 | 0.063484 | 0.170786 | 0.152652 | 0.007339 | -0.098590 |
| diabetes_prevalence | -0.060493 | -0.072198 | -0.072394 | -0.072404 | -0.078984 | -0.080795 | 0.042081 | 0.022212 | 0.026003 | -0.044598 | -0.017334 | -0.022725 | 0.052727 | 0.025804 | -0.037622 | 0.000321 | -0.065286 | -0.012927 | -0.011976 | 0.009414 | -0.005046 | 0.058241 | 0.059215 | 0.077516 | 0.042892 | 0.060796 | 0.058108 | 0.049870 | 0.006233 | -0.028161 | -0.029690 | -0.024715 | -0.028608 | -0.028498 | -0.010601 | -0.032465 | -0.029801 | -0.024017 | 0.136867 | -0.075626 | 0.026118 | 0.286323 | 0.077630 | 0.018650 | 0.282379 | -0.273647 | 0.244890 | 1.000000 | 0.005562 | 0.164340 | 0.144972 | 0.103632 | 0.436338 | 0.461805 | -0.016454 | 0.075327 |
| female_smokers | -0.016033 | -0.014383 | -0.014427 | -0.017543 | -0.016850 | -0.017246 | 0.290454 | 0.277451 | 0.324493 | 0.319525 | 0.239894 | 0.310304 | 0.237481 | 0.143981 | 0.346683 | 0.180451 | 0.374788 | 0.080540 | 0.078978 | 0.057261 | 0.093810 | 0.129709 | 0.094105 | 0.243980 | 0.215084 | 0.154964 | 0.267484 | 0.107077 | 0.025387 | -0.006693 | -0.002798 | -0.000799 | -0.008240 | -0.008931 | 0.049559 | 0.065581 | 0.046449 | 0.024590 | 0.124935 | -0.067073 | -0.062127 | 0.596544 | 0.770230 | 0.729027 | 0.405520 | -0.241286 | -0.041821 | 0.005562 | 1.000000 | 0.481034 | -0.230735 | 0.472167 | 0.357628 | 0.519716 | 0.029037 | 0.243168 |
| male_smokers | -0.028103 | -0.031940 | -0.032033 | -0.035624 | -0.035987 | -0.036835 | 0.090835 | 0.088948 | 0.104041 | 0.030308 | 0.051983 | 0.066786 | 0.215312 | 0.025479 | 0.091937 | 0.036959 | 0.121144 | 0.026038 | 0.032478 | 0.012390 | 0.034457 | 0.059662 | 0.049853 | 0.134093 | 0.111272 | 0.069635 | 0.140483 | 0.075188 | 0.016185 | -0.011914 | -0.012807 | -0.011158 | -0.012932 | -0.007200 | 0.009010 | 0.000998 | 0.000693 | -0.033433 | 0.208290 | -0.016264 | -0.063928 | 0.509389 | 0.417560 | 0.380441 | 0.192875 | -0.090650 | 0.334512 | 0.164340 | 0.481034 | 1.000000 | 0.128322 | 0.366958 | 0.340875 | 0.461399 | -0.000105 | 0.132629 |
| handwashing_facilities | 0.003269 | -0.000706 | -0.000643 | 0.004469 | 0.003875 | 0.004048 | -0.119412 | -0.110486 | -0.129319 | -0.101904 | -0.071446 | -0.092489 | 0.071026 | -0.070677 | -0.155586 | -0.086607 | -0.161093 | -0.029040 | -0.032793 | -0.027558 | -0.038569 | -0.042721 | -0.031025 | -0.127758 | -0.103854 | -0.053442 | -0.132594 | 0.127191 | 0.005201 | -0.001401 | -0.002428 | -0.004413 | 0.001317 | -0.002696 | -0.061014 | -0.066005 | -0.049580 | -0.080367 | 0.138666 | 0.032874 | -0.070068 | 0.010280 | -0.163717 | -0.204727 | -0.273995 | -0.023015 | 0.279880 | 0.144972 | -0.230735 | 0.128322 | 1.000000 | -0.121914 | 0.089155 | 0.048794 | 0.009637 | -0.129136 |
| hospital_beds_per_thousand | -0.042292 | -0.045197 | -0.045448 | -0.049499 | -0.049041 | -0.050265 | 0.137358 | 0.145796 | 0.169874 | 0.143193 | 0.133625 | 0.172452 | 0.172320 | 0.053769 | 0.191935 | 0.067886 | 0.222383 | 0.039539 | 0.042805 | 0.021923 | 0.057214 | 0.046531 | 0.032103 | 0.121365 | 0.114516 | 0.066067 | 0.144199 | 0.036745 | 0.024104 | -0.020649 | -0.020879 | -0.016593 | -0.022176 | -0.020466 | 0.002118 | 0.010227 | 0.006791 | -0.003140 | 0.088965 | -0.061791 | 0.268463 | 0.533658 | 0.627768 | 0.608093 | 0.342248 | -0.317773 | 0.063484 | 0.103632 | 0.472167 | 0.366958 | -0.121914 | 1.000000 | 0.408082 | 0.452838 | 0.003035 | 0.120056 |
| life_expectancy | -0.181257 | -0.207708 | -0.208477 | -0.205350 | -0.222544 | -0.227720 | 0.125502 | 0.104595 | 0.122631 | 0.087003 | 0.060635 | 0.078847 | 0.333756 | 0.062059 | 0.143717 | 0.077994 | 0.137395 | 0.029031 | 0.029440 | 0.023308 | 0.031422 | 0.071982 | 0.055947 | 0.166853 | 0.128829 | 0.083632 | 0.166006 | 0.147926 | 0.065463 | -0.085711 | -0.085441 | -0.075702 | -0.087618 | -0.092898 | 0.038295 | 0.036722 | 0.031069 | 0.045405 | 0.329506 | -0.235857 | 0.123856 | 0.660811 | 0.495638 | 0.485926 | 0.425693 | -0.145519 | 0.170786 | 0.436338 | 0.357628 | 0.340875 | 0.089155 | 0.408082 | 1.000000 | 0.751472 | 0.001072 | 0.167732 |
| human_development_index | -0.101104 | -0.115344 | -0.115709 | -0.114825 | -0.124281 | -0.127139 | 0.163102 | 0.141638 | 0.166580 | 0.131226 | 0.107483 | 0.139706 | 0.357612 | 0.101677 | 0.213604 | 0.120263 | 0.208631 | 0.039938 | 0.044893 | 0.038155 | 0.049089 | 0.118512 | 0.095064 | 0.249154 | 0.192909 | 0.136084 | 0.246570 | 0.187903 | 0.074840 | -0.047074 | -0.045670 | -0.039920 | -0.049239 | -0.050473 | -0.006267 | -0.013074 | -0.014912 | -0.085219 | 0.316832 | -0.147653 | -0.128936 | 0.804194 | 0.690167 | 0.638597 | 0.548965 | -0.253455 | 0.152652 | 0.461805 | 0.519716 | 0.461399 | 0.048794 | 0.452838 | 0.751472 | 1.000000 | -0.005875 | 0.249810 |
| death_rate | -0.006046 | -0.007221 | -0.007022 | -0.001535 | 0.008762 | 0.009072 | -0.026569 | -0.031092 | -0.035884 | 0.011712 | 0.041898 | 0.050747 | -0.039500 | 0.002655 | 0.016448 | 0.008314 | 0.014673 | 0.005702 | -0.001001 | 0.001086 | 0.000277 | -0.009751 | -0.008914 | -0.027813 | -0.019392 | -0.010460 | -0.025004 | -0.004562 | 0.004192 | -0.002702 | -0.002199 | -0.001935 | -0.002707 | -0.003393 | -0.011414 | -0.010258 | -0.008141 | -0.014338 | 0.026320 | 0.004737 | 0.000378 | 0.002433 | 0.019838 | 0.022357 | -0.018833 | -0.004740 | 0.007339 | -0.016454 | 0.029037 | -0.000105 | 0.009637 | 0.003035 | 0.001072 | -0.005875 | 1.000000 | -0.027813 |
| population_coverage | 0.008587 | 0.000681 | 0.001657 | -0.002447 | -0.009840 | -0.009179 | 0.456945 | 0.278986 | 0.334678 | 0.284074 | 0.168618 | 0.224186 | 0.125472 | 0.124600 | 0.255071 | 0.145460 | 0.286501 | 0.008722 | 0.045602 | 0.048545 | 0.045947 | 0.231107 | 0.233722 | 1.000000 | 0.700120 | 0.233772 | 0.808666 | 0.044721 | 0.049068 | 0.025837 | 0.028937 | 0.026033 | 0.018442 | 0.022941 | 0.296202 | 0.226494 | 0.171501 | 0.231352 | 0.079944 | -0.044799 | 0.002514 | 0.241955 | 0.233574 | 0.225655 | 0.384017 | -0.129305 | -0.098590 | 0.075327 | 0.243168 | 0.132629 | -0.129136 | 0.120056 | 0.167732 | 0.249810 | -0.027813 | 1.000000 |
#owid_covid_data = pd.read_csv('https://raw.githubusercontent.com/owid/covid-19-data/master/public/data/owid-covid-data.csv')
#owid_covid_data.head()
#Select Coloumn to clean
ColumnToClean = ['total_cases', 'new_cases', 'total_deaths', 'new_deaths','aged_65_older','aged_70_older','gdp_per_capita','diabetes_prevalence','female_smokers','male_smokers','hospital_beds_per_thousand']
#Replace the nan with emty string
covid_df_copy[['location']] = covid_df_copy[['location']].fillna('')
#Replace the Nan with 0
covid_df_copy[ColumnToClean] = covid_df_copy[ColumnToClean].fillna(0)
#Filter the data so we will get only overall world data
covid_df_copy = covid_df_copy.query('location=="World"' )
Data_For_Regression = pd.DataFrame(columns=['date','total_cases', 'new_cases', 'total_deaths', 'new_deaths','aged_65_older','aged_70_older','gdp_per_capita','diabetes_prevalence','female_smokers','male_smokers','hospital_beds_per_thousand'], data=covid_df_copy[['date','total_cases', 'new_cases', 'total_deaths', 'new_deaths','aged_65_older','aged_70_older','gdp_per_capita','diabetes_prevalence','female_smokers','male_smokers','hospital_beds_per_thousand']].values)
Data_For_Regression.head()
| date | total_cases | new_cases | total_deaths | new_deaths | aged_65_older | aged_70_older | gdp_per_capita | diabetes_prevalence | female_smokers | male_smokers | hospital_beds_per_thousand | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2020-01-22 | 557 | 0 | 17 | 0 | 8.696 | 5.355 | 15469.2 | 8.51 | 6.434 | 34.635 | 2.705 |
| 1 | 2020-01-23 | 655 | 98 | 18 | 1 | 8.696 | 5.355 | 15469.2 | 8.51 | 6.434 | 34.635 | 2.705 |
| 2 | 2020-01-24 | 941 | 286 | 26 | 8 | 8.696 | 5.355 | 15469.2 | 8.51 | 6.434 | 34.635 | 2.705 |
| 3 | 2020-01-25 | 1433 | 492 | 42 | 16 | 8.696 | 5.355 | 15469.2 | 8.51 | 6.434 | 34.635 | 2.705 |
| 4 | 2020-01-26 | 2118 | 685 | 56 | 14 | 8.696 | 5.355 | 15469.2 | 8.51 | 6.434 | 34.635 | 2.705 |
#set the index as date
Data_For_Regression['date'] = pd.to_datetime(Data_For_Regression['date'])
Data_For_Regression = Data_For_Regression.set_index('date')
Data_For_Regression.head()
| total_cases | new_cases | total_deaths | new_deaths | aged_65_older | aged_70_older | gdp_per_capita | diabetes_prevalence | female_smokers | male_smokers | hospital_beds_per_thousand | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| date | |||||||||||
| 2020-01-22 | 557 | 0 | 17 | 0 | 8.696 | 5.355 | 15469.2 | 8.51 | 6.434 | 34.635 | 2.705 |
| 2020-01-23 | 655 | 98 | 18 | 1 | 8.696 | 5.355 | 15469.2 | 8.51 | 6.434 | 34.635 | 2.705 |
| 2020-01-24 | 941 | 286 | 26 | 8 | 8.696 | 5.355 | 15469.2 | 8.51 | 6.434 | 34.635 | 2.705 |
| 2020-01-25 | 1433 | 492 | 42 | 16 | 8.696 | 5.355 | 15469.2 | 8.51 | 6.434 | 34.635 | 2.705 |
| 2020-01-26 | 2118 | 685 | 56 | 14 | 8.696 | 5.355 | 15469.2 | 8.51 | 6.434 | 34.635 | 2.705 |
#Plot the graph
Data_For_Regression['total_cases'].plot(figsize=(15,6), color="green")
plt.legend(loc=4)
plt.xlabel('Date')
plt.ylabel('Cases')
plt.show()
Data_For_Regression['total_deaths'].plot(figsize=(15,6), color="red")
plt.xlabel('Date')
plt.ylabel('Death')
plt.show()
Data_For_Regression['new_cases'].plot(figsize=(15,6), color="blue")
plt.xlabel('Date')
plt.ylabel('New Cases')
plt.show()
# pick total death as forecast column
forecast_col = 'total_deaths'
# Chosing 30 days as number of forecast days
forecast_out = int(30)
print('length =',len(Data_For_Regression), "and forecast_out =", forecast_out)
length = 424 and forecast_out = 30
# Creating label by shifting 'total_deaths' according to 'forecast_out'
Data_For_Regression['temp'] = Data_For_Regression[forecast_col].shift(-forecast_out)
print(Data_For_Regression.head(2))
print('\n')
# verify rows with NAN in Label column
print(Data_For_Regression.tail(2))
total_cases new_cases total_deaths new_deaths aged_65_older \
date
2020-01-22 557 0 17 0 8.696
2020-01-23 655 98 18 1 8.696
aged_70_older gdp_per_capita diabetes_prevalence female_smokers \
date
2020-01-22 5.355 15469.2 8.51 6.434
2020-01-23 5.355 15469.2 8.51 6.434
male_smokers hospital_beds_per_thousand temp
date
2020-01-22 34.635 2.705 2252
2020-01-23 34.635 2.705 2459
total_cases new_cases total_deaths new_deaths aged_65_older \
date
2021-03-19 1.22316e+08 526273 2.70144e+06 10410 8.696
2021-03-20 1.22814e+08 498140 2.70964e+06 8194 8.696
aged_70_older gdp_per_capita diabetes_prevalence female_smokers \
date
2021-03-19 5.355 15469.2 8.51 6.434
2021-03-20 5.355 15469.2 8.51 6.434
male_smokers hospital_beds_per_thousand temp
date
2021-03-19 34.635 2.705 NaN
2021-03-20 34.635 2.705 NaN
# Define features Matrix X by excluding the label column which we just created
X = np.array(Data_For_Regression.drop(['temp'], 1))
# Using a feature in sklearn, preposessing to scale features
X = preprocessing.scale(X)
print(X[1,:])
[-0.96277535 -1.30745477 -1.17772698 -1.57016705 0. 0. 0. 0. 0. 0. 0. ]
# X contains last 'n= forecast_out' rows for which we don't have label data
# Put those rows in different Matrix X_forecast_out by X_forecast_out = X[end-forecast_out:end]
X_forecast_out = X[-forecast_out:]
X = X[:-forecast_out]
print ("Length of X_forecast_out:", len(X_forecast_out), "& Length of X :", len(X))
Length of X_forecast_out: 30 & Length of X : 394
# Define vector y for the data we have prediction for
# make sure length of X and y are identical
y = np.array(Data_For_Regression['temp'])
y = y[:-forecast_out]
print('Length of y: ',len(y))
Length of y: 394
# (split into test and train data)
# test_size = 0.2 ==> 20% data is test data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)
print('length of X_train and x_test: ', len(X_train), len(X_test))
length of X_train and x_test: 315 79
# Create linear regression object
lr = LinearRegression()
# Train the model using the training sets
lr.fit(X_train, y_train)
# Test
accuracy = lr.score(X_test, y_test)
print("Accuracy of Linear Regression: ", accuracy)
Accuracy of Linear Regression: 0.9977341359065376
# Predict using our Model
forecast_prediction = lr.predict(X_forecast_out)
print(forecast_prediction)
[2761369.60822026 2742437.89032144 2716729.72549334 2723886.09683447 2788747.50374794 2821086.80735987 2819915.21601815 2830601.30857385 2810759.52283328 2776173.46813663 2795955.78519842 2827930.95955923 2878011.83052154 2883045.16751441 2893069.35882272 2877604.12019003 2855573.53005105 2854086.52866266 2915178.08026103 2937093.14712452 2950239.62234392 2962518.40862667 2956578.76082901 2916684.33256088 2929072.89997022 2992631.12646166 3021011.48961908 3038624.40664073 3041431.8177346 3028397.38128222]
Data_For_Regression.tail()
| total_cases | new_cases | total_deaths | new_deaths | aged_65_older | aged_70_older | gdp_per_capita | diabetes_prevalence | female_smokers | male_smokers | hospital_beds_per_thousand | temp | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| date | ||||||||||||
| 2021-03-16 | 1.20697e+08 | 472966 | 2.67045e+06 | 9997 | 8.696 | 5.355 | 15469.2 | 8.51 | 6.434 | 34.635 | 2.705 | NaN |
| 2021-03-17 | 1.21236e+08 | 538804 | 2.68052e+06 | 10063 | 8.696 | 5.355 | 15469.2 | 8.51 | 6.434 | 34.635 | 2.705 | NaN |
| 2021-03-18 | 1.21789e+08 | 553312 | 2.69104e+06 | 10519 | 8.696 | 5.355 | 15469.2 | 8.51 | 6.434 | 34.635 | 2.705 | NaN |
| 2021-03-19 | 1.22316e+08 | 526273 | 2.70144e+06 | 10410 | 8.696 | 5.355 | 15469.2 | 8.51 | 6.434 | 34.635 | 2.705 | NaN |
| 2021-03-20 | 1.22814e+08 | 498140 | 2.70964e+06 | 8194 | 8.696 | 5.355 | 15469.2 | 8.51 | 6.434 | 34.635 | 2.705 | NaN |
last_date = Data_For_Regression.iloc[-1].name
last_date
Timestamp('2021-03-20 00:00:00')
todays_date = datetime.strptime(last_date.strftime("%Y-%m-%d"), "%Y-%m-%d")
todays_date = todays_date + timedelta(days=1)
todays_date = datetime.strptime(todays_date.strftime("%Y-%m-%d"), "%Y-%m-%d")
index = pd.date_range(todays_date, periods=30, freq='D')
columns = ['total_cases', 'new_cases', 'total_deaths', 'new_deaths','aged_65_older','aged_70_older','gdp_per_capita','diabetes_prevalence','female_smokers','male_smokers','hospital_beds_per_thousand','temp','forecast']
temp_df = pd.DataFrame(index=index, columns=columns)
temp_df
| total_cases | new_cases | total_deaths | new_deaths | aged_65_older | aged_70_older | gdp_per_capita | diabetes_prevalence | female_smokers | male_smokers | hospital_beds_per_thousand | temp | forecast | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2021-03-21 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 2021-03-22 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 2021-03-23 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 2021-03-24 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 2021-03-25 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 2021-03-26 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 2021-03-27 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 2021-03-28 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 2021-03-29 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 2021-03-30 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 2021-03-31 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 2021-04-01 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 2021-04-02 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 2021-04-03 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 2021-04-04 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 2021-04-05 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 2021-04-06 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 2021-04-07 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 2021-04-08 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 2021-04-09 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 2021-04-10 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 2021-04-11 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 2021-04-12 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 2021-04-13 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 2021-04-14 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 2021-04-15 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 2021-04-16 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 2021-04-17 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 2021-04-18 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 2021-04-19 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
j=0
for i in forecast_prediction:
temp_df.iat[j,12] = i
j= j+1
temp_df
| total_cases | new_cases | total_deaths | new_deaths | aged_65_older | aged_70_older | gdp_per_capita | diabetes_prevalence | female_smokers | male_smokers | hospital_beds_per_thousand | temp | forecast | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2021-03-21 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 2.76137e+06 |
| 2021-03-22 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 2.74244e+06 |
| 2021-03-23 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 2.71673e+06 |
| 2021-03-24 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 2.72389e+06 |
| 2021-03-25 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 2.78875e+06 |
| 2021-03-26 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 2.82109e+06 |
| 2021-03-27 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 2.81992e+06 |
| 2021-03-28 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 2.8306e+06 |
| 2021-03-29 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 2.81076e+06 |
| 2021-03-30 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 2.77617e+06 |
| 2021-03-31 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 2.79596e+06 |
| 2021-04-01 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 2.82793e+06 |
| 2021-04-02 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 2.87801e+06 |
| 2021-04-03 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 2.88305e+06 |
| 2021-04-04 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 2.89307e+06 |
| 2021-04-05 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 2.8776e+06 |
| 2021-04-06 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 2.85557e+06 |
| 2021-04-07 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 2.85409e+06 |
| 2021-04-08 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 2.91518e+06 |
| 2021-04-09 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 2.93709e+06 |
| 2021-04-10 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 2.95024e+06 |
| 2021-04-11 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 2.96252e+06 |
| 2021-04-12 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 2.95658e+06 |
| 2021-04-13 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 2.91668e+06 |
| 2021-04-14 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 2.92907e+06 |
| 2021-04-15 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 2.99263e+06 |
| 2021-04-16 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 3.02101e+06 |
| 2021-04-17 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 3.03862e+06 |
| 2021-04-18 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 3.04143e+06 |
| 2021-04-19 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 3.0284e+06 |
#Append the forcasted - Initially did it for easness but kater decided to use xgboost also
Data_For_Regression['total_deaths'].plot(figsize=(15,6), color="red")
temp_df['forecast'].plot(figsize=(15,6), color="orange")
plt.xlabel('Date')
plt.ylabel('Death')
plt.show()
# XGboost algorithm to see if we can get better results
xgb_model = xgb.XGBRegressor(objective ='reg:squarederror',colsample_bytree=0.4,
gamma=0,
learning_rate=0.07,
max_depth=3,
min_child_weight=1.5,
n_estimators=10000,
reg_alpha=0.75,
reg_lambda=0.45,
subsample=0.6)
traindf, testdf = train_test_split(X_train, test_size = 0.2)
xgb_model.fit(X_train,y_train)
xgforecast_prediction = xgb_model.predict(X_forecast_out)
xgforecast_prediction
array([2704656.8, 2664913.5, 2536690.5, 2613490.5, 2718172.5, 2689632. ,
2673268.8, 2692658.8, 2692065.8, 2523291.5, 2579734.8, 2612704.2,
2710746.8, 2683083.2, 2674768.8, 2685463.5, 2556501.8, 2596244.8,
2673547. , 2670320.2, 2676758.2, 2687230.2, 2673305.8, 2559467. ,
2608374.2, 2668408.5, 2656320.8, 2672826.5, 2676134.5, 2675404.8],
dtype=float32)
#Setting the temperory df with XGboost forecasted data
j=0
for i in xgforecast_prediction:
temp_df.iat[j,12] = i
j= j+1
Data_For_Regression['total_deaths'].plot(figsize=(15,6), color="red")
temp_df['forecast'].plot(figsize=(15,6), color="orange")
plt.xlabel('Date')
plt.ylabel('Death')
plt.show()
It decided on using Population Over Age 65 and diabetes_prevalence cardiovasc_death_rate because in the world, over 80% of the deaths were in the population 65 and over, and the CDC has stated that 94% of deaths had some underlying health condition. We also used Life Expectancy per country to account for possible deficiencies in the health care system. John Hopkins University has listed several diseases such as heart disease and Diabetes which are known to be exacerbated by cardiovasc_death_rate and Obesity. Our idea is that we can more accurately predict the Mortality Ratio of COVID-19 by using both population 65 and over and Obesity rather than just population 65 and over. This may show that creating a healthier population is the best way to prevent the devastation in future pandemics that the world is currently facing
After viewing the graphs in Linear Regression-Forecast we the accuracy that XGboost algorithms can achieve with this data. . We will continue and see if our ML Algorithm can do better than we are expecting. We have initially chosen to use categorization with the HighRisk category as that may be more accurate than regression. Or can we use more precise algorithms to build a data-appropriate learning model?
covid_df_copy = covid_df.copy()
covid_df_copy.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 76215 entries, 0 to 76214 Data columns (total 61 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 iso_code 76215 non-null object 1 continent 72473 non-null object 2 location 76215 non-null object 3 date 76215 non-null object 4 total_cases 76215 non-null float64 5 new_cases 76215 non-null float64 6 new_cases_smoothed 76215 non-null float64 7 total_deaths 76215 non-null float64 8 new_deaths 76215 non-null float64 9 new_deaths_smoothed 76215 non-null float64 10 total_cases_per_million 76215 non-null float64 11 new_cases_per_million 76215 non-null float64 12 new_cases_smoothed_per_million 76215 non-null float64 13 total_deaths_per_million 76215 non-null float64 14 new_deaths_per_million 76215 non-null float64 15 new_deaths_smoothed_per_million 76215 non-null float64 16 reproduction_rate 76215 non-null float64 17 icu_patients 76215 non-null float64 18 icu_patients_per_million 76215 non-null float64 19 hosp_patients 76215 non-null float64 20 hosp_patients_per_million 76215 non-null float64 21 weekly_icu_admissions 76215 non-null float64 22 weekly_icu_admissions_per_million 76215 non-null float64 23 weekly_hosp_admissions 76215 non-null float64 24 weekly_hosp_admissions_per_million 76215 non-null float64 25 new_tests 76215 non-null float64 26 total_tests 76215 non-null float64 27 total_tests_per_thousand 76215 non-null float64 28 new_tests_per_thousand 76215 non-null float64 29 new_tests_smoothed 76215 non-null float64 30 new_tests_smoothed_per_thousand 76215 non-null float64 31 positive_rate 76215 non-null float64 32 tests_per_case 76215 non-null float64 33 tests_units 40920 non-null object 34 total_vaccinations 76215 non-null float64 35 people_vaccinated 76215 non-null float64 36 people_fully_vaccinated 76215 non-null float64 37 new_vaccinations 76215 non-null float64 38 new_vaccinations_smoothed 76215 non-null float64 39 total_vaccinations_per_hundred 76215 non-null float64 40 people_vaccinated_per_hundred 76215 non-null float64 41 people_fully_vaccinated_per_hundred 76215 non-null float64 42 new_vaccinations_smoothed_per_million 76215 non-null float64 43 stringency_index 76215 non-null float64 44 population 76215 non-null float64 45 population_density 76215 non-null float64 46 median_age 76215 non-null float64 47 aged_65_older 76215 non-null float64 48 aged_70_older 76215 non-null float64 49 gdp_per_capita 76215 non-null float64 50 extreme_poverty 76215 non-null float64 51 cardiovasc_death_rate 76215 non-null float64 52 diabetes_prevalence 76215 non-null float64 53 female_smokers 76215 non-null float64 54 male_smokers 76215 non-null float64 55 handwashing_facilities 76215 non-null float64 56 hospital_beds_per_thousand 76215 non-null float64 57 life_expectancy 76215 non-null float64 58 human_development_index 76215 non-null float64 59 death_rate 68898 non-null float64 60 population_coverage 75798 non-null float64 dtypes: float64(56), object(5) memory usage: 35.5+ MB
#it decided to Create a column the High Risk and base it off of total_deaths_per_million which is the Total deaths attributed to COVID-19 per 1,000,000 people
covid_df_copy['HighRisk'] = zscore(covid_df_copy['total_deaths_per_million']) > 0.65
covid_df_copy.head(20)
| iso_code | continent | location | date | total_cases | new_cases | new_cases_smoothed | total_deaths | new_deaths | new_deaths_smoothed | ... | diabetes_prevalence | female_smokers | male_smokers | handwashing_facilities | hospital_beds_per_thousand | life_expectancy | human_development_index | death_rate | population_coverage | HighRisk | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | AFG | Asia | Afghanistan | 2020-02-24 | 1.0 | 1.0 | 0.000 | 0.0 | 0.0 | 0.0 | ... | 9.59 | 0.0 | 0.0 | 37.746 | 0.5 | 64.83 | 0.511 | NaN | 0.0 | False |
| 1 | AFG | Asia | Afghanistan | 2020-02-25 | 1.0 | 0.0 | 0.000 | 0.0 | 0.0 | 0.0 | ... | 9.59 | 0.0 | 0.0 | 37.746 | 0.5 | 64.83 | 0.511 | NaN | 0.0 | False |
| 2 | AFG | Asia | Afghanistan | 2020-02-26 | 1.0 | 0.0 | 0.000 | 0.0 | 0.0 | 0.0 | ... | 9.59 | 0.0 | 0.0 | 37.746 | 0.5 | 64.83 | 0.511 | NaN | 0.0 | False |
| 3 | AFG | Asia | Afghanistan | 2020-02-27 | 1.0 | 0.0 | 0.000 | 0.0 | 0.0 | 0.0 | ... | 9.59 | 0.0 | 0.0 | 37.746 | 0.5 | 64.83 | 0.511 | NaN | 0.0 | False |
| 4 | AFG | Asia | Afghanistan | 2020-02-28 | 1.0 | 0.0 | 0.000 | 0.0 | 0.0 | 0.0 | ... | 9.59 | 0.0 | 0.0 | 37.746 | 0.5 | 64.83 | 0.511 | NaN | 0.0 | False |
| 5 | AFG | Asia | Afghanistan | 2020-02-29 | 1.0 | 0.0 | 0.143 | 0.0 | 0.0 | 0.0 | ... | 9.59 | 0.0 | 0.0 | 37.746 | 0.5 | 64.83 | 0.511 | 0.0 | 0.0 | False |
| 6 | AFG | Asia | Afghanistan | 2020-03-01 | 1.0 | 0.0 | 0.143 | 0.0 | 0.0 | 0.0 | ... | 9.59 | 0.0 | 0.0 | 37.746 | 0.5 | 64.83 | 0.511 | 0.0 | 0.0 | False |
| 7 | AFG | Asia | Afghanistan | 2020-03-02 | 1.0 | 0.0 | 0.000 | 0.0 | 0.0 | 0.0 | ... | 9.59 | 0.0 | 0.0 | 37.746 | 0.5 | 64.83 | 0.511 | NaN | 0.0 | False |
| 8 | AFG | Asia | Afghanistan | 2020-03-03 | 2.0 | 1.0 | 0.143 | 0.0 | 0.0 | 0.0 | ... | 9.59 | 0.0 | 0.0 | 37.746 | 0.5 | 64.83 | 0.511 | 0.0 | 0.0 | False |
| 9 | AFG | Asia | Afghanistan | 2020-03-04 | 4.0 | 2.0 | 0.429 | 0.0 | 0.0 | 0.0 | ... | 9.59 | 0.0 | 0.0 | 37.746 | 0.5 | 64.83 | 0.511 | 0.0 | 0.0 | False |
| 10 | AFG | Asia | Afghanistan | 2020-03-05 | 4.0 | 0.0 | 0.429 | 0.0 | 0.0 | 0.0 | ... | 9.59 | 0.0 | 0.0 | 37.746 | 0.5 | 64.83 | 0.511 | 0.0 | 0.0 | False |
| 11 | AFG | Asia | Afghanistan | 2020-03-06 | 4.0 | 0.0 | 0.429 | 0.0 | 0.0 | 0.0 | ... | 9.59 | 0.0 | 0.0 | 37.746 | 0.5 | 64.83 | 0.511 | 0.0 | 0.0 | False |
| 12 | AFG | Asia | Afghanistan | 2020-03-07 | 4.0 | 0.0 | 0.429 | 0.0 | 0.0 | 0.0 | ... | 9.59 | 0.0 | 0.0 | 37.746 | 0.5 | 64.83 | 0.511 | 0.0 | 0.0 | False |
| 13 | AFG | Asia | Afghanistan | 2020-03-08 | 5.0 | 1.0 | 0.571 | 0.0 | 0.0 | 0.0 | ... | 9.59 | 0.0 | 0.0 | 37.746 | 0.5 | 64.83 | 0.511 | 0.0 | 0.0 | False |
| 14 | AFG | Asia | Afghanistan | 2020-03-09 | 7.0 | 2.0 | 0.857 | 0.0 | 0.0 | 0.0 | ... | 9.59 | 0.0 | 0.0 | 37.746 | 0.5 | 64.83 | 0.511 | 0.0 | 0.0 | False |
| 15 | AFG | Asia | Afghanistan | 2020-03-10 | 8.0 | 1.0 | 0.857 | 0.0 | 0.0 | 0.0 | ... | 9.59 | 0.0 | 0.0 | 37.746 | 0.5 | 64.83 | 0.511 | 0.0 | 0.0 | False |
| 16 | AFG | Asia | Afghanistan | 2020-03-11 | 11.0 | 3.0 | 1.000 | 0.0 | 0.0 | 0.0 | ... | 9.59 | 0.0 | 0.0 | 37.746 | 0.5 | 64.83 | 0.511 | 0.0 | 0.0 | False |
| 17 | AFG | Asia | Afghanistan | 2020-03-12 | 12.0 | 1.0 | 1.143 | 0.0 | 0.0 | 0.0 | ... | 9.59 | 0.0 | 0.0 | 37.746 | 0.5 | 64.83 | 0.511 | 0.0 | 0.0 | False |
| 18 | AFG | Asia | Afghanistan | 2020-03-13 | 13.0 | 1.0 | 1.286 | 0.0 | 0.0 | 0.0 | ... | 9.59 | 0.0 | 0.0 | 37.746 | 0.5 | 64.83 | 0.511 | 0.0 | 0.0 | False |
| 19 | AFG | Asia | Afghanistan | 2020-03-14 | 15.0 | 2.0 | 1.571 | 0.0 | 0.0 | 0.0 | ... | 9.59 | 0.0 | 0.0 | 37.746 | 0.5 | 64.83 | 0.511 | 0.0 | 0.0 | False |
20 rows × 62 columns
corr = covid_df_copy[['death_rate', 'total_deaths_per_million', 'aged_65_older','extreme_poverty', 'icu_patients', 'life_expectancy', 'cardiovasc_death_rate', 'diabetes_prevalence', 'human_development_index', 'population_density', 'aged_70_older', 'population_coverage']].corr()
corr.style.background_gradient(cmap='coolwarm')
#Total deaths attributed to COVID-19 per 1,000,000 people
#With the new data the correlations seem stronger on aged_70_older, 'cardiovasc_death_rate', 'diabetes_prevalence', 'human_development_index', 'population_density', 'aged_70_older',and 'population_coverage' than before.
| death_rate | total_deaths_per_million | aged_65_older | extreme_poverty | icu_patients | life_expectancy | cardiovasc_death_rate | diabetes_prevalence | human_development_index | population_density | aged_70_older | population_coverage | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| death_rate | 1.000000 | 0.011712 | 0.019838 | -0.004740 | 0.002655 | 0.001072 | 0.007339 | -0.016454 | -0.005875 | 0.000378 | 0.022357 | -0.027813 |
| total_deaths_per_million | 0.011712 | 1.000000 | 0.261580 | -0.190581 | 0.209748 | 0.087003 | -0.210793 | -0.044598 | 0.131226 | -0.024086 | 0.264115 | 0.284074 |
| aged_65_older | 0.019838 | 0.261580 | 1.000000 | -0.314490 | 0.152195 | 0.495638 | -0.061668 | 0.077630 | 0.690167 | -0.038343 | 0.965808 | 0.233574 |
| extreme_poverty | -0.004740 | -0.190581 | -0.314490 | 1.000000 | -0.042952 | -0.145519 | 0.191294 | -0.273647 | -0.253455 | -0.055936 | -0.315749 | -0.129305 |
| icu_patients | 0.002655 | 0.209748 | 0.152195 | -0.042952 | 1.000000 | 0.062059 | -0.068656 | 0.025804 | 0.101677 | -0.012730 | 0.154094 | 0.124600 |
| life_expectancy | 0.001072 | 0.087003 | 0.495638 | -0.145519 | 0.062059 | 1.000000 | 0.170786 | 0.436338 | 0.751472 | 0.123856 | 0.485926 | 0.167732 |
| cardiovasc_death_rate | 0.007339 | -0.210793 | -0.061668 | 0.191294 | -0.068656 | 0.170786 | 1.000000 | 0.244890 | 0.152652 | -0.179121 | -0.098685 | -0.098590 |
| diabetes_prevalence | -0.016454 | -0.044598 | 0.077630 | -0.273647 | 0.025804 | 0.436338 | 0.244890 | 1.000000 | 0.461805 | 0.026118 | 0.018650 | 0.075327 |
| human_development_index | -0.005875 | 0.131226 | 0.690167 | -0.253455 | 0.101677 | 0.751472 | 0.152652 | 0.461805 | 1.000000 | -0.128936 | 0.638597 | 0.249810 |
| population_density | 0.000378 | -0.024086 | -0.038343 | -0.055936 | -0.012730 | 0.123856 | -0.179121 | 0.026118 | -0.128936 | 1.000000 | -0.047520 | 0.002514 |
| aged_70_older | 0.022357 | 0.264115 | 0.965808 | -0.315749 | 0.154094 | 0.485926 | -0.098685 | 0.018650 | 0.638597 | -0.047520 | 1.000000 | 0.225655 |
| population_coverage | -0.027813 | 0.284074 | 0.233574 | -0.129305 | 0.124600 | 0.167732 | -0.098590 | 0.075327 | 0.249810 | 0.002514 | 0.225655 | 1.000000 |
predictors = ['diabetes_prevalence','icu_patients','life_expectancy', 'cardiovasc_death_rate', 'human_development_index', 'aged_70_older', 'population_density', 'female_smokers', 'male_smokers', 'extreme_poverty']
target = 'HighRisk'
X = covid_df_copy[predictors].values
y = covid_df_copy[target].values
# Split the data into training and test sets, and scale
scaler = StandardScaler()
# unscaled version (note that scaling is only used on predictor variables)
X_train_raw, X_test_raw, y_train, y_test = train_test_split(X, y, test_size=0.30, random_state=42)
# scaled version
X_train = scaler.fit_transform(X_train_raw)
X_test = scaler.transform(X_test_raw)
print('First 10 Rows of Scaled Data: \n\n', X_train[0:10:,], '\n')
knn = KNeighborsClassifier()
knn.fit(X_train, y_train)
predictions = knn.predict(X_test)
accuracy = (predictions == y_test).mean()
print('Accuracy:', round(accuracy * 100, 2), '%')
First 10 Rows of Scaled Data: [[ 2.16970375e+00 -9.92836241e-02 4.45157032e-01 -6.32460446e-01 7.48536558e-01 -8.31170980e-01 1.00417221e+00 -1.78831296e-01 7.75762571e-01 -4.86384094e-01] [ 2.22119711e+00 -9.92836241e-02 3.80613143e-01 1.84617450e-01 5.78461190e-01 -3.67873461e-01 -1.38292306e-01 -6.58327027e-01 1.03379387e+00 -4.80475136e-01] [ 1.23814206e+00 -9.92836241e-02 1.24722329e-01 1.64963247e-01 2.86903415e-01 -1.79343481e-01 -1.95825003e-01 -1.89993855e-02 1.06067213e+00 -4.86384094e-01] [-1.46525932e+00 -9.92836241e-02 -4.41321860e-01 -3.05476685e-03 -4.94633396e-01 -7.03878840e-01 -1.36518668e-01 -6.98285004e-01 -5.84277375e-01 2.44445898e+00] [-7.49386104e-02 -9.92836241e-02 4.80570493e-01 -5.65009433e-01 6.06807084e-01 1.22706265e+00 -1.85802333e-01 6.40307245e-01 -1.75727826e-01 -4.80475136e-01] [ 5.44889451e+00 -9.92836241e-02 2.40101670e-01 2.40469736e+00 1.49223355e-01 -1.14928665e+00 -1.47735658e-02 -7.58221971e-01 -1.24548257e+00 -4.86384094e-01] [-4.04964233e-01 -9.92836241e-02 -3.06522235e-01 -2.68311729e-01 1.69470423e-01 -4.49065205e-01 -1.69032883e-01 5.09270753e-02 5.39233885e-01 6.30408932e-01] [-2.17715653e-01 -9.92836241e-02 1.27007069e-01 7.97293125e-01 2.05915145e-01 -4.49065205e-01 -1.07569740e-01 -4.78516128e-01 2.84538858e+00 -1.49573499e-01] [ 1.14217716e+00 -9.92836241e-02 4.68004426e-01 -4.85906499e-01 6.18955325e-01 1.14800514e-02 -1.32914258e-01 6.50296739e-01 9.63910390e-01 -4.74566179e-01] [-1.17422146e-02 -9.92836241e-02 -2.81390101e-01 8.06492010e-01 -3.77200403e-01 -6.76126860e-01 -1.88501293e-01 -5.88400566e-01 1.56598341e+00 1.69993030e+00]] Accuracy: 90.19 %
n = 8
accuracies = []
ks = np.arange(1, n+1, 2)
for k in ks:
print(k, ' ', end='')
knn = KNeighborsClassifier(n_neighbors=k)
knn.fit(X_train, y_train)
predictions = knn.predict(X_test)
acc = (predictions == y_test).mean()
accuracies.append(acc)
print('done')
def get_best(ks, accuracies):
maximum = np.array(accuracies).max()
indexMax = np.where(accuracies == maximum)
return ks[indexMax], maximum
best_k, best_acc = get_best(ks, accuracies)
print('best k = {}, best accuracy: {:0.3f}%'.format(best_k, best_acc * 100))
1 3 5 7 done best k = [7], best accuracy: 90.518%
print('Comparison of predictions to y_test values: \n\n', predictions == y_test)
print('\nPredictions:\n\n', predictions)
print('\nY_test values:\n\n', y_test)
Comparison of predictions to y_test values: [ True True True ... True True True] Predictions: [ True False False ... False False False] Y_test values: [ True False False ... False False False]
A further look at our predictions and Y_test values show that we get 90.226% simply by predicting almost everything as False so this model's features and data should be improved
predictors = ['diabetes_prevalence','icu_patients','life_expectancy', 'cardiovasc_death_rate','human_development_index', 'aged_70_older', 'population_density', 'female_smokers', 'male_smokers', 'extreme_poverty']
target = 'HighRisk'
X = covid_df_copy[predictors].values
y = covid_df_copy[target].values
# Split the data into training and test sets, and scale
scaler = StandardScaler()
# unscaled version (note that scaling is only used on predictor variables)
X_train_raw, X_test_raw, y_train, y_test = train_test_split(X, y, test_size=0.30, random_state=42)
# scaled version
X_train = scaler.fit_transform(X_train_raw)
X_test = scaler.transform(X_test_raw)
ks = np.arange(1, 8, 2)
for k in ks:
best_acc = 0
selected = []
remaining = list(range(X_train.shape[1]))
n = 11
better = True
while len(selected) < n and better == True:
# find the single features that works best in conjunction
# with the already selected features
acc_max = 0
for i in remaining:
# make a version of the training data with just selected, feature i
selectedFi = selected.copy()
selectedFi.append(i)
X_si = X_train[:,selectedFi]
y_siTrain = y_train[~np.isnan(X_si).any(axis=1)]
X_si=X_si[~np.isnan(X_si).any(axis=1)]
knn = KNeighborsClassifier(n_neighbors=k)
knn.fit(X_si, y_siTrain)
X_testSi = X_test[:,selectedFi]
y_siTest = y_test[~np.isnan(X_testSi).any(axis=1)]
X_testSi = X_testSi[~np.isnan(X_testSi).any(axis=1)]
predictions = knn.predict(X_testSi)
acc = (predictions == y_siTest).mean()
if (acc > acc_max):
acc_max = acc
i_min = i
if (best_acc < acc):
best_acc = acc
better = True
else:
better = False
if (better == True):
remaining.remove(i_min)
selected.append(i_min)
print('k: {}; num features: {}; features: {}; bestAcc: {:.2f}%'.format(k, len(selected), [predictors[x] for x in selected], best_acc*100))
k: 1; num features: 1; features: ['icu_patients']; bestAcc: 86.57% k: 1; num features: 2; features: ['icu_patients', 'male_smokers']; bestAcc: 87.74% k: 1; num features: 3; features: ['icu_patients', 'male_smokers', 'extreme_poverty']; bestAcc: 88.32% k: 1; num features: 4; features: ['icu_patients', 'male_smokers', 'extreme_poverty', 'life_expectancy']; bestAcc: 88.42% k: 1; num features: 5; features: ['icu_patients', 'male_smokers', 'extreme_poverty', 'life_expectancy', 'aged_70_older']; bestAcc: 88.46% k: 3; num features: 1; features: ['icu_patients']; bestAcc: 86.88% k: 3; num features: 2; features: ['icu_patients', 'life_expectancy']; bestAcc: 89.28% k: 3; num features: 3; features: ['icu_patients', 'life_expectancy', 'female_smokers']; bestAcc: 89.80% k: 3; num features: 4; features: ['icu_patients', 'life_expectancy', 'female_smokers', 'human_development_index']; bestAcc: 89.83% k: 3; num features: 5; features: ['icu_patients', 'life_expectancy', 'female_smokers', 'human_development_index', 'population_density']; bestAcc: 90.09% k: 5; num features: 1; features: ['life_expectancy']; bestAcc: 87.97% k: 5; num features: 2; features: ['life_expectancy', 'icu_patients']; bestAcc: 89.88% k: 5; num features: 3; features: ['life_expectancy', 'icu_patients', 'female_smokers']; bestAcc: 90.33% k: 7; num features: 1; features: ['population_density']; bestAcc: 88.38% k: 7; num features: 2; features: ['population_density', 'icu_patients']; bestAcc: 89.67% k: 7; num features: 3; features: ['population_density', 'icu_patients', 'female_smokers']; bestAcc: 90.22% k: 7; num features: 4; features: ['population_density', 'icu_patients', 'female_smokers', 'human_development_index']; bestAcc: 90.53%
aged_70_older', 'icu_patients', 'life_expectancy', 'cardiovasc_death_rate'
# change default plot size
rcParams['figure.figsize'] = 10,8
sns.scatterplot(data=covid_df_copy, x='male_smokers', y='cardiovasc_death_rate', hue='HighRisk', style='HighRisk');
plt.xlabel("Cardiovascular Death Rate")
plt.ylabel("Male Smokers");
sns.scatterplot(data=covid_df_copy, x='life_expectancy', y='diabetes_prevalence', hue='HighRisk', style='HighRisk')
plt.xlabel("Life Expectancy")
plt.ylabel("% of Obesity");
sns.scatterplot(data=covid_df_copy, x='human_development_index', y='positive_rate', hue='HighRisk', style='HighRisk');
plt.xlabel("Human Development Index")
plt.ylabel("positive_rate");
sns.scatterplot(data=covid_df_copy, x='aged_65_older', y='diabetes_prevalence', hue='HighRisk', style='HighRisk');
plt.xlabel("% Age 65 and older")
plt.ylabel("% of diabetes_prevalence");
k = 7
predictors = ['diabetes_prevalence','icu_patients','female_smokers','male_smokers', 'human_development_index', 'life_expectancy','cardiovasc_death_rate','positive_rate']
target = 'HighRisk'
# unscaled version (note that scaling is only used on predictor variables)
X = covid_df_copy[predictors].values
y = covid_df_copy[target].values
# Split the data into training and test sets, and scale
scaler = StandardScaler()
X_train_raw, X_test_raw, y_train, y_test = train_test_split(X, y, test_size=0.30, random_state=42)
# scaled version
X_train = scaler.fit_transform(X_train_raw)
X_test = scaler.transform(X_test_raw)
#Remove data rows with nan
y_train = y_train[~np.isnan(X_train).any(axis=1)]
X_train = X_train[~np.isnan(X_train).any(axis=1)]
y_test = y_test[~np.isnan(X_test).any(axis=1)]
X_test = X_test[~np.isnan(X_test).any(axis=1)]
knn = KNeighborsClassifier(n_neighbors=k)
knn.fit(X_train, y_train)
predictions = knn.predict(X_test)
acc = (predictions == y_test).mean()
print('Comparison of predictions to y_test values: \n\n', predictions == y_test)
print('\nPredictions:\n\n', predictions)
print('\nY_test values:\n\n', y_test)
print('\nAccurancy: ', acc)
Comparison of predictions to y_test values: [ True True True ... True True True] Predictions: [ True False False ... False False False] Y_test values: [ True False False ... False False False] Accurancy: 0.937109118740433
Two questions guide this daily updated publication on the global COVID-19 pandemic:
How can we make progress against the pandemic? And, are we making progress? To answer these questions we need data. But data is not enough. This is especially true in this pandemic because even the best available data is far from perfect. Much of our work therefore focuses on explaining what the data can – and can not – tell us about the pandemic.
Our goal is two-fold:
To provide reliable, global and open data and research on how the COVID-19 pandemic is spreading, what impact the pandemic has, how we can make progress against the pandemic, and whether the measures countries are taking are successful or not; And to build an infrastructure that allows research colleagues – and everyone who is interested – to navigate and understand this data and research. Before we study how to make progress we should consider the more basic question: is it possible to do so?
The answer is very clear: While some countries have failed in their response to the pandemic, others met the challenge much more successfully. Perhaps the most important thing to know about the pandemic is that it is possible to fight the pandemic.
Responding successfully means two things: limiting the direct and the indirect impact of the pandemic. Countries that have responded most successfully were able to avoid choosing between the two: they avoided the trade-off between a high mortality and a high socio-economic impact of the pandemic. New Zealand has been able to bring infections down and open up their country internally. Other island nations were also able to almost entirely prevent an outbreak (like Taiwan, Australia, and Iceland). But not only islands were able to bend the curve of infections and prevent large outbreaks – Norway, Uruguay, Switzerland, South Korea, and Germany are examples. These countries suffered a smaller direct impact, but they also limited the indirect impacts because they were able to release lockdown measures earlier.
Together with colleagues at the Robert Koch Institute, the Chan School of Public Health, the UK Public Health Rapid Support Team, the London School of Hygiene and Tropical Medicine and other institutions we study countries that responded most successfully in detail.
Among the countries with the highest death toll are some of the most populous countries in the world such as the US, Brazil, and Mexico. If you prefer to adjust for the differences in population size you can switch to per capita statistics by clicking the ‘per million people’ tickbox.
We can see three different ways in which the pandemic has affected countries:
While some commentaries on the pandemic have the premise that all countries failed to respond well to the pandemic the exact opposite stands out to us: Even at this early stage of the pandemic we see very large differences between countries – as the chart shows. While some suffer terrible outbreaks others have managed to contain rapid outbreaks or even prevented bad outbreaks entirely. It is possible to respond successfully to the pandemic.
Fighting the pandemic: What can everyone of us do to flatten the curve? Some measures against the pandemic are beyond what any individual can do. The development of a vaccine, R&D in pharmaceutical research, building the infrastructure to allow large-scale testing, and coordinated policy responses require large-scale collaboration and are society-wide efforts. We will explore these later.
But, as with all big problems, there are many ways to make progress and some of the most important measures are up to all of us.
In the fight against the pandemic we are in the fortunate situation that what is good for ourselves is also good for everyone else. By protecting yourself you are slowing the spread of the pandemic.
You and everyone else have the same two clear personal goals during this pandemic: Don’t get infected and don’t infect others.
To not get infected you have to do what you can to prevent the virus from entering your body through your mouth, nose, or eyes. To not infect others your goal is to prevent the virus from traveling from your body to the mouth, nose or eyes of somebody else.
What can you do? How can all of us – you and me – do our part to flatten the curve? The three main measures are called the three Ws: Wash your hands, wear a mask, watch your distance.
1- https://www.geeksforgeeks.org/python-programming-language/?ref=leftbar
2- https://www.python-course.eu/python3_class_and_instance_attributes.php
3- https://thispointer.com/data-analysis-in-python-using-pandas/
4- https://jovian.ml/learn/data-analysis-with-python-zero-to-pandas
5- https://ourworldindata.org/coronavirus